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Abstract 

An  indexed-based  object  recognition  system  using  geometric  invariance  techniques 
has  been  designed,  and  used  to  recognize  buildings  in  an  image  of  a  military  site  and 
for  recognizing  curved  planar  objects  including  gaskets.  New  invariants  and  indexing 
techniques  for  polyhedral  and  curved  objects  with  repetition  or  bilateral  symmetry  and 
objects  with  the  imaged  outline  of  a  surface  of  revolution  have  been  developed.  A  method 
to  distinguish  projectively  equivalent  but  Euclidean  distinct  objects  in  an  uncalibrated 
view  has  been  investigated.  A  group-theoretic  framework  for  relating  quasi-invariants  to 
invariants  has  been  formulated. 

Computing  invariants  can  be  formulated  as  an  algebraic  manipulation  problem  involv¬ 
ing  variable  elimination  and  solving  nonlinear  polynomial  equations.  Based  on  Dixon’s 
formulation  of  resultants,  new  methods  for  eliminating  variables  have  been  developed 
and  implemented.  These  methods  are  much  faster  and  superior  than  other  elimination 
techniques.  A  branch  and  prune  approach  for  numerically  solving  polynomial  equations 
has  been  developed.  A  simple  algorithm  for  separating  invariant  relations  among  object 
and  image  features  to  compute  invariants  of  object  features  has  been  designed.  These 
algorithms  can  serve  as  a  basis  for  building  an  invariant  work-bench  that  would  enable 
researchers  to  experiment  with  geometric  configurations 
invariants. 
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1  Introduction 


Geometric  and  algebraic  invariants  have  become  an  active  research  area  in  computer  vision 
as  they  enable  view-independent  classification  and  recognition  of  objects.^  This  approach 
is  particularly  attractive  because  it  is  scalable  and  it  does  not  rely  on  camera  geometries. 

A  serious  deficiency  of  most  recognition  systems  is  their  inefficiency  at  determining  objects 
in  a  scene.  Most  systems  attempt  to  match  every  object  that  could  be  in  a  scene  (i.e. 
the  entire  model  library)  to  all  combinations  of  features  in  an  image.  Invariant  based 
approach  can  enable  a  recognition  system  to  work  with  a  large  model  base,  which  has 
not  been  possible  until  now.  An  invariant  based  approach  can  be  also  used  for  model 
acquisition  from  images. 

During  this  contract,  a  number  of  research  activities  related  to  computation  and  use 
of  geometric  and  algebraic  invariants  in  object  recognition  systems  have  been  initiated. 
Mundy,  Forsyth  and  Zisserman  have  developed  an  indexed-based  object  recognition  sys¬ 
tem  [20].  The  system  has  been  effectively  used  to  recognize  buildings  in  an  image  of  a 
military  site  as  weU  as  for  recognizing  curved  planar  objects  including  gaskets.  It  was 
demonstrated  using  live  images  at  the  Second  European  Conference  on  Computer  Vision 
in  Italy.  In  order  to  make  the  recognition  system  to  work  on  a  larger  class  of  objects  in¬ 
cluding  curved  objects,  new  invariants  and  indexing  techniques  for  polyhedral  and  curved 
objects  with  repetition  or  bilateral  symmetry  as  well  as  objects  with  the  imaged  out¬ 
line  of  a  surface  of  revolution  have  been  developed  and  integrated  into  the  recognition 
system.  Consistency  conditions  that  allow  objects  that  are  Euclidean  distinct  but  pro- 
jectively  equivalent  to  be  distinguished  in  an  uncalibrated  view  have  been  integrated  into 
geometric  invariance. 

Another  important  development  during  the  current  contract  has  been  the  development 
of  a  theoretical  framework  for  studying  quasi-invariants,  and  analyzing  their  relationship 
with  invariants  [1].  Quasi-invariants  were  introduced  by  Binford  as  an  alternative  to 
the  strict  requirements  of  invariants.^  Quasi-invariants  are  properties  which  need  not  be 
invariant  for  the  whole  set  of  imaging  transformations,  but  are  invariant  locally  for  useful 
portions  of  the  view  space. 

Discovering  invariants  is  an  art.  Vision  researchers  have  mostly  used  invariants  in 
their  work  which  were  discovered  by  geometricists  during  the  last  century.  Computing 
invariants  can  be  formulated  as  an  algebraic  manipulation  problem  involving  elimination 
of  many  variables  from  polynomial  representations  of  algebraic  relationship  among  object 
and  image  features,  and  extracting  invariants  from  the  results  of  elimination.  Symbolic 
and  numerical  methods  for  manipulating  nonlinear  polynomial  equations  are  helpful  in 
identifying  invariant  relations  for  a  geometric  configuration  and  hypothesizing  invariants 
based  on  them. 

Based  on  Dixon’s  formulation  of  multivariate  resultants,  new  methods  for  eliminating 
variables  efficiently  have  been  developed  and  implemented  [15].  In  comparison  to  other 

^J.L.  Mundy  and  A.  Zisserman  (eds.),  Geometric  Invariance  in  Computer  Vision^  (eds.  Mundy  and  Zisser¬ 
man),  MIT  Press,  1992,  1-39.  J.L.  Mundy,  A.  Zisserman,  and  D.A.  Forsyth  (eds.).  Applications  of  Invariance 
in  Computer  Vision^  LNCS  825,  Springer-Verlag,  1994. 

^T.  Binford  and  T.S.  Levitt,  “Quasi-invariants:  Theory  and  Exploitation,”  Proc.  DARPA  Image  Under¬ 
standing  workshop,  Washington  D.C.,  1993. 
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elimination  methods  including  Macaulay  and  sparse  resultants,  Grobner  basis  methods 
and  triangulation  methods,  a  method  based  on  Dixon  formulation  is  shown  to  be  much 
faster  and  far  superior  in  space  and  time,  on  a  variety  of  problems  from  many  application 
domains  including  vision,  invariant  computation,  robotics  and  kinematics  [13].  Recent 
theoretical  investigations  have  revealed  that  methods  based  on  Dixon’s  formulation  are 
superior  because  Dixon’s  formulation  implicitly  exploits  the  sparsity  of  polynomial  systems 

[14]. 

In  collaboration  with  David  McAllester  and  Pascal  Van  Hentenryck,  a  branch  and 
prune  algorithm  has  been  developed  to  numerically  find  isolated  solutions  of  polynomial 
constraints  [19].  The  algorithm  has  been  implemented  by  Van  Hentenryck  as  the  Newton 
system,  and  has  been  tried  on  a  many  examples.  Despite  being  so  simple,  the  method  out¬ 
performs  conventional  interval  Newton  methods  and  homotopy-based  continuation  meth¬ 
ods. 

A  simple  algorithm  for  separating  invariant  relations  among  object  and  image  features 
has  been  recently  designed.  This  algorithm  checks  whether  a  given  invariant  relation 
among  object  and  image  features  is  separable,  and  if  so,  computes  an  invariant  expression 
in  terms  of  object  features. 

We  thus  have  the  state-of-the-art  methods  and  implementations  for  numerically  as 
well  as  symbolically  solving  polynomial  equations.  These  advances  put  us  in  an  excellent 
position  to  work  towards  an  invariant  work-bench,  a  software  system  that  would  enable 
researchers  to  experiment  with  geometric  configurations  and  investigate  their  geometric 
invariant  and  quasi-invariant  relations  with  respect  to  different  transformation  groups. 
Future  planned  activities  include  the  development  of  such  an  invariant  work-bench,  con¬ 
taining  necessary  tools  for  constructing  and  manipulating  invariants  and  quasi-invariants. 

In  the  next  two  sections,  we  provide  some  specific  details  about  the  main  results 
achieved  in  geometric  invariance  and  indexing  techniques  as  well  as  in  symbolic  and  nu¬ 
meric  computation  methods.  Section  4  is  the  educational  impact  statement.  Section  5 
is  a  list  of  the  published  papers  citing  the  contract.  A  copy  of  the  papers  discussing  the 
results  in  detail  are  included  in  the  Appendix. 


2  Geometric  Invariance  and  Object  Recognition 

Most  of  the  work  in  object  recognition  focussed  on  expanding  the  range  of  objects  that 
could  be  recognized  using  the  techniques  of  geometric  invariance.  New  indexing  techniques 
for  various  families  of  curved  surfaces  were  developed,  particularly,  proposing  that  repe¬ 
tition  or  symmetry  in  the  geometry  of  an  object  yields  an  extended  view  of  that  object, 
and  hence  can  produce  an  accurate  reconstruction.  Geometric  invariance  techniques  have 
been  extended  to  include  a  consistency  argument  that  allowed  objects  that  are  Euclidean 
distinct  but  projectively  equivalent  to  be  distinguished  in  an  uncalibrated  view. 

The  ideas  underlying  object  recognition  using  invariance  are  quite  widely  scattered 
across  an  increasing  literature.  In  [20],  the  evolution  of  geometric  invariant  approach  to 
object  recognition  is  elaborated.  The  paper  also  discusses  the  relative  usefulness  of  the 
techniques  constructing  invariants  of  three  dimensional  objects  from  a  single  perspective 
view,  and  assumptions  under  which  they  work.  The  design  of  a  control  architecture  for 
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a  system  that  can  efficiently  exploit  and  integrate  currently  available  representational 
techniques,  is  presented.  For  example,  there  are  classes  for  planar  objects,  symmetric 
point  sets,  canal  surfaces,  and  surfaces  of  revolution.  In  [7],  additional  issues  in  recovering 
representations  are  discussed;  it  is  argued  that  the  importance  of  markings  on  objects  has 
been  underrated  to  date. 

2.1  Indexing  techniques  for  curved  surfaces 

It  is  traditionally  harder  to  recognize  curved  objects  in  images  because  of  the  complex 
nature  of  the  relationship  between  the  geometry  of  the  object  and  what  appears  in  the 
image.  In  a  variety  of  situations,  constraints  on  the  geometry  of  the  surface  make  it 
possible  to  infer  surface  geometry  from  image  information  alone.  It  has  been  shown  in 
a  series  of  papers  [5,  6]  that  in  the  case  of  extruded  surfaces,  the  surface’s  projective 
geometry  is  in  fact  extremely  sparse  and  can  be  captured  by  a  plane  curve,  and  a  line 
on  that  curve’s  plane.  For  many  surfaces,  this  configuration,  which  completely  represents 
the  surface,  generates  indexing  functions.  Most  importantly,  when  surface  geometry  has 
been  inferred,  markings  on  the  surface  -  edges,  grey  levels  of  color  values  (the  property 
that  distinguishes  between  different  soft  drink  cans,  for  instance)  -  can  be  mapped  back 
onto  the  recovered  surface  and  used  to  generate  a  more  distinctive  representation. 

It  was  shown  that  it  is  possible  to  recover  the  complete  projective  geometry  of  a 
generic  algebraic  surface  from  a  single,  uncalibrated  image  taken  from  a  single,  unknown 
viewpoint  [3].  This  constructive  result  is  counterintuitive  and  is  also  mathematically 
intricate.  Unfortunately,  experimental  work  suggests  that  the  practical  consequences  of 
this  result  are  few,  as  the  recognition  process  involves  solving  polynomial  systems  of  very 
high  degree  with  great  accuracy  and  constructing  a  robust  fitting  technique. 

New  invariants  for  curved  objects  with  bilateral  symmetry,  objects  with  the  imaged 
outline  of  a  surface  of  revolution  and  straight  homogeneous  generalized  cylinders  using 
constructions  based  on  bi-tangents  to  the  surface  have  been  developed  [16].  The  class  of 
surfaces  are  those  generated  as  an  envelope  of  a  sphere  of  varying  radius  swept  along  an 
axis.  The  class  includes  canal  surfaces,  and  surfaces  of  revolution.  The  representations  are 
computed  using  only  image  information,  from  the  symmetry  set  of  the  object’s  outline. 
They  are  invariant  under  weak-perspective  imaging,  and  quasi-invariant  to  an  excellent 
approximation  under  perspective  imaging.  Objects  in  the  same  class  can  be  discriminated 
under  perspective  projection. 

Multiple  views  of  a  single  object  lead  to  depth  information  through  a  process  of  match¬ 
ing  points  and  triangulation.  A  single  view  of  an  object  that  has  one  of  a  variety  of  sym¬ 
metries  is  conceptually  equivalent  to  a  series  of  views  of  part  of  that  object;  for  example,  a 
single  view  of  an  object  with  a  twofold  symmetry  is  equivalent  to  two  views  of  half  of  that 
object.  We  have  explored  the  interactions  between  object  symmetries  and  the  ambiguity 
in  the  reconstructed  representation  available  in  these  kinds  of  situations. 


2.2  Consistency  techniques 

Indexing  by  projective  invariants,  while  efficient,  has  the  unfortunate  result  that  projec- 
tively  equivalent  objects  cannot  be  distinguished,  as  they  generate  the  same  representa- 
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tion.  This  means  that  the  techniques  cannot  distinguish  between  distinct  objects  related 
by  a  projective  transformation,  for  example,  boxes  of  diflferent  sizes.  In  [4],  it  is  argued 
that  this  difficulty  disappears  when  more  than  one  object  is  in  view.  The  ambiguity  is 
resolved  because  a  recognition  hypothesis  for  a  single  object  implicitly  contains  informa¬ 
tion  about  the  internal  parameters  of  the  camera  viewing  that  object.  If  two  or  more 
objects  are  present,  these  hypotheses  must  be  consistent.  A  recognition  system  using 
this  view  can  index  the  projective  structure  of  objects  using  projective  invariants.  It  can 
search  over  Euclidean  distinct,  projectively  equivalent  hypotheses  for  consistent  systems 
of  hypotheses.  EucUdean  ambiguities  about  objects  are  largely  a  property  of  the  model 
library.  So  conditions  that  give  rise  to  such  ambiguities  can  be  analyzed  and  utilized  to 
distinguish  correct  systems  of  hypotheses  about  Euclidean  distinct  objects  from  incorrect 
hypotheses. 

2.3  Quasi-invariants 

Based  on  pragmatic  considerations,  Binford  introduced  quasi-invariants  as  an  alternative 
to  the  strict  requirements  of  invariants.  Quasi-invariants  are  properties  which  need  not 
be  invariant  for  the  whole  set  of  imaging  transformations,  but  are  invariant  locally  for 
useful  portions  of  the  view  space.  The  advantage  is  that  many  more  quasi-invariants  are 
available  for  a  given  set  of  geometric  features  than  projective  invariants. 

Quasi-invariants  are  not  well-understood,  in  particular,  their  relation  to  invariants  is 
unclear.  A  group-theoretic  framework  for  relating  quasi-invariants  to  invariants  as  weU 
as  for  studying  the  variability  of  quasi-invariants  has  been  developed  [1].  This  research 
would  enable  to  use  invariants  whenever  they  exist  and  are  easy  to  compute,  and  augment 
them  by  quasi-invariants  otherwise. 

A  quasi-invariant  is  defined  in  terms  of  three  elements: 

•  a  geometric  configuration  and  associated  property  of  the  configuration, 

•  a  distinguished  set  of  image  transformations,  and 

•  a  specific  transformation  in  the  set  of  image  transformations. 

Different  properties  of  configurations  can  serve  as  quasi-invariant  at  different  points  in 
the  transformation  group.  The  set  of  image  transformations  need  not  be  a  group,  but  all 
known  quasi-invariants  are  defined  with  respect  to  group  invariants.  AU  the  examples  of 
quasi-invariants  used  by  Binford  can  be  studied  using  this  group-theoretic  framework. 


3  Symbolic  Numeric  Computation  Methods 

3.1  Resurrecting  Dixon  Resultants 

While  writing  an  introductory  article  on  elimination  methods  [10],  we  came  across  papers 
by  Dixon  written  in  the  early  1900’s  in  which  he  discussed  his  attempts  to  extend  Bezout- 
Cayley’s  method  for  formulating  resultants  for  eliminating  a  single  variable  from  two 
polynomials.  In  one  of  the  papers,  Dixon  was  successful  in  giving  a  method  for  eliminating 
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two  variables  from  three  generic  bi-degree  polynomials.^  However  when  polynomials  are 
specialized  to  have  particular  coefficients,  then  Dixon’s  method  does  not  work.  Thus  the 
main  limitation  of  Dixon’s  approach  was  that  it  worked  for  a  very  small  and  restricted  class 
of  polynomial  systems.  This  problem  of  Dixon’s  approach  leading  to  singular  resultant 
matrices  almost  always  occurs  for  polynomial  systems  as  number  of  variables  increase. 

We  have  been  successful  in  extending  Dixon’s  approach  to  work  on  a  large  class  of 
polynomial  systems.  In  [15],  we  showed  how  Dixon’s  formulation  of  multivariate  resultants 
can  be  extended  using  rank  submatrix  computation^  a  simple  linear  algebra  technique. 

This  construction  works  also  for  other  determinantal  formulations  of  resultants  in  which 
matrices  could  be  singular.  As  an  example,  the  technique  of  rank  submatrix  computation 
can  be  used  as  an  alternative  to  perturbation  techniques  for  singular  Macaulay  matrices, 
and  it  has  been  found  to  be  more  efficient  in  practice  than  the  generalized  characteristic 
polynomial  discussed  by  Canny  as  there  is  no  need  to  introduce  an  extra  variable. 

We  have  implemented  an  interpolation-based  approach  for  computing  a  projection  op¬ 
erator  based  on  the  above  discussed  extension  of  Dixon’s  method.  This  implementation 
has  been  tried  on  many  examples  from  application  domains  including  robotics  and  inverse 
kinematics,  geometric  and  solid  modeling,  engineering  design,  vision,  computational  bi¬ 
ology  as  well  as  geometric  theorem  proving  and  problem  solving.  Using  this  extension, 
we  have  been  able  to  solve  many  polynomial  systems  which  could  not  be  solved,  using 
reasonable  computational  time  and  space,  with  any  other  methods  including  Macaulay 
resultants,  sparse  resultants,  the  Grobner  basis  method,  and  Ritt-Wu’s  characteristic  set 
method.  We  are  very  excited  by  this  development  since  it  appears  to  be  the  most  efficient 
method  for  computing  the  projection  operator  in  practice. 

We  have  been  able  to  show  that  in  contrast  to  Macaulay  resultants  in  which  the  size 
of  the  Macaulay  matrix  for  a  polynomial  system  is  related  to  its  Bezout  bound,  Dixon 
formulation  is  not  governed  by  Bezout  bound  [14].  Instead,  the  size  of  the  Dixon  matrix 
for  a  polynomial  system  can  be  expressed  in  terms  of  the  volume  of  polytopes  of  the 
polynomial  system.  The  Dixon  formulation  implicitly  exploits  the  sparsity  of  polynomial 
systems.  In  particular,  it  is  shown  that  the  size  of  the  Dixon  matrix  of  a  system  of 
polynomials  with  the  same  set  of  terms  is  bounded  by  the  number  of  integral  points  inside 
the  Minkowski  sum  of  successive  projections  of  the  Newton  poly  topes  of  polynomials.  Thus 
the  size  of  the  Dixon  matrix  is  much  smaller  than  the  size  of  Macaulay  matrix;  further,  the 
size  of  the  Dixon  matrix  is  also  smaller  (by  an  exponential  factor)  than  the  size  of  sparse 
resultant  matrix,  which  is  bounded  by  the  number  of  integral  points  inside  the  Minkowski 
sum  of  the  Newton  polytopes  (not  their  successive  projections)  of  the  polynomials.  It  is 
proved  that  even  though  entries  in  a  Dixon  matrix  are  somewhat  complicated  in  contrast 
to  entries  in  the  Macaulay  matrix  and  sparse  resultant  matrix  where  the  entries  are  either 
0  or  coefficients  of  terms  in  polynomials,  the  Dixon  matrix  can  be  constructed  fast  using 
dense  interpolation  techniques.  These  results  serve  as  a  theoretical  justification  for  the 
superiority  of  the  method  based  on  Dixon  formulation  in  practice  over  other  methods. 

^A.L.  Dixon,  “The  eliminant  of  three  quantics  in  two  independent  variables,”  Proc,  London  Maihemaiical 
Society,  6,  1908,  468-478. 
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3.2  Branch  and  Prune  Approach  for  Numerically  Solving 
Polynomial  Constraints 

In  collaboration  with  David  McAUester  and  Pascal  Van  Hentenryck,  we  have  developed  a 
branch  and  prune  based  approach  for  numerically  solving  polynomial  constraints  (equali¬ 
ties  as  well  as  inequalities)  using  interval  computations  [19].  The  approach  can  be  viewed 
as  a  global  search  method  which  uses  interval  computations  for  numerical  accuracy  and 
pruning  search  space  using  local  methods.  As  a  first  approximation,  each  polynomial  is 
successively  viewed  as  a  univariate  polynomial  in  each  variable  with  interval  coefficients, 
and  the  interval  for  each  variable  is  narrowed  to  ensure  that  there  is  a  solution  on  the 
boundary  of  the  interval  for  each  variable.  When  an  interval  for  a  variable  cannot  be 
narrowed  using  unary  interval  constraints,  then  branching  is  performed  by  splitting  the 
interval  into  equal  halves,  and  this  process  of  narrowing  the  intervals  is  repeated  until  no 
solution  can  be  found  within  an  interval  or  a  sufficiently  small  interval  (determined  by  the 
desired  level  of  accuracy)  is  identified. 

The  approach  can  be  parameterized  by  different  transformations  for  pruning  the  inter¬ 
vals.  The  Newton  system  currently  uses  three  such  transformations;  (i)  natural  formula¬ 
tion  of  the  constraints  viewed  as  functions  of  a  single  variable  by  fixing  the  values  of  other 
variables  as  intervals  available  as  current  guesses,  (ii)  distributed  formulation  in  which  a 
normal  form  of  a  constraint  is  used,  and  (iii)  finally  for  convergence  and  accuracy  near  a 
solution,  Taylor  series  expansion  of  a  constraint  around  the  center  of  each  interval.  The 
first  two  transformations  quickly  prune  the  interval  when  it  is  too  large  and  thus  far  away 
from  a  solution,  whereas  the  third  transformation  is  helpful  in  convergence  and  avoiding 
generation  of  small  intervals  around  solutions. 

The  approach  is  quite  simple,  both  from  the  viewpoints  of  mathematics  and  program¬ 
ming.  It  has  been  implemented  and  tried  with  remarkable  success,  on  many  benchmark 
examples  from  different  application  domains  including  robotics,  kinematics,  engineering, 
vision  and  chemistry. 

3.3  Solving  Parametric  Polynomial  Equations 

We  have  developed  a  simple  but  powerful  approach  for  solving  a  parametric  system  of 
polynomial  equations  (in  which  parameters  are  distinguished  from  variables)  [9].  Grobner 
basis  algorithm  and  characteristic  set  triangulation  algorithm  have  been  extended  to  gen¬ 
erate  a  (finite)  family  of  solved  forms.  Each  solved  form  can  be  used  to  generate  solu¬ 
tions  of  parametric  systems  for  a  subset  of  parameter  values  expressed  as  a  finite  set  of 
constraints.  The  solution  sets  are  classified  for  different  parameter  values  depending  on 
whether  there  are  no  solutions,  finitely  many  solutions,  or  infinitely  many  solutions.  The 
extension  is  based  on  a  constraints  paradigm,  in  which  parametric  relations  are  viewed  as 
constraints.  The  algorithms  incrementally  partition  the  parameter  space  and  provide  use¬ 
ful  structural  information  about  solutions  of  a  nonlinear  polynomial  system  for  different 
ranges  of  parameter  values. 

An  understanding  of  the  solutions  of  a  parametric  system  is  useful  in  a  wide  variety  of 
applications  including  computer  vision,  computer  aided  design  and  modeling.  For  different 
parameter  subspaces,  the  solution  set  may  behave  differently  because  of  which  different 
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optimizations  may  be  possible.  The  constraint-based  approach  for  parametric  systems 
may  be  useful  for  solving  systems  of  polynomials  with  floating  point  coefficients  using 
symbolic  techniques.  Algorithms  for  solving  parameterized  systems  would  be  particularly 
relevant  for  constructing  quasi-invariants  where  a  goal  is  to  construct  rational  functions 
of  object  parameters  which  do  not  vary  locally  for  a  large  range  of  imaging  parameters. 

3.4  Other  results 

A  method  for  converting  a  Grobner  basis  with  respect  to  a  degree  ordering  to  a  Grdbner 
basis  with  respect  to  lexicographic  order  has  been  developed  for  positive  dimension  ideals 
using  linear  algebra  techniques  [12].  A  main  reason  for  interest  in  such  a  basis  conversion 
algorithm  is  that  Grobner  basis  computation  is  much  faster  using  the  degree  term  ordering 
in  contrast  to  lexicographic  term  ordering.  A  lexicographic  Grobner  basis  is,  however,  a 
lot  more  useful  in  gaining  insight  into  the  structure  of  zeros  of  a  polynomial  system. 

We  resurrected  Ritt’s  concept  of  characteristic  sets  [8]  and  demonstrated  that  it  is 
quite  different  from  the  concept  of  characteristic  set  that  Wu  has  popularized  in  the 
context  of  geometry  theorem  proving  as  well  as  for  equation  solving.  We  proved  many 
interesting  characterization  theorems  for  systems  of  polynomials  to  be  characteristic  sets. 
The  applications  of  this  approach  for  solving  equations  as  well  as  computing  structure  of 
the  zero  sets  of  polynomials  have  been  investigated. 

Results  about  basis  conversion  as  well  as  characteristic  set  conversion  have  been  imple¬ 
mented  in  our  experimental  software  GEOMETER,  a  system  for  algebraic  and  symbolic 
manipulation.  Implementations  of  the  Grobner  basis  algorithm  as  well  as  characteristic 
set  construction  algorithm  in  GEOMETER  run  much  more  efficiently  than  the  implemen¬ 
tations  of  these  algorithms  in  commercially  available  computer  algebra  systems  including 
Maple,  Mathematica,  and  Macsyma.  We  have  paid  considerable  attention  to  efficiently 
coding  the  primitive  operations  in  GEOMETER. 

Distributed  implementations  of  homotopy  techniques  based  on  continuation  for  nu¬ 
merically  solving  equations  have  also  been  experimented  with."*  Whereas  conventional 
homotopy  methods  use  Bezout  bound  for  constructing  an  initial  system  whose  solutions 
are  traced  to  find  solutions  of  a  given  system,  the  homotopy  discussed  by  Verschelde  et  al 
used  Bernstein  bound  which  is  dependent  on  the  Newton  poly  topes  of  the  polynomials  in 
the  system  and  is,  thus,  known  to  be  much  smaller  for  sparse  polynomial  systems.  Our 
experience  with  the  homotopy  method  based  on  Newton  poly  topes  was  not  very  positive; 
the  method  was  found  to  be  inefficient  and  slow  since  the  construction  of  an  initial  system 
is  recursive  and  quite  complicated.  We  have  instead  used  a  homotopy  method  discussed 
by  Morgan  on  a  number  of  examples  including  examples  from  computer  vision,  invariants 
and  robotics.  The  distributed  implementation  runs  concurrently  on  a  network  of  Sparc 
work-stations  since  computations  for  different  paths  corresponding  to  different  solutions 
can  proceed  concurrently. 


^A.P.  Morgan,  Solving  PolynoTnictl  Sysicms  using  Continuation  for  Scientific  and  Engineering  ProblemSf 
Prentice  Hall,  NJ,  1987.  J.  Verschelde,  P.  Verlinden,  and  R.  Cools,  “Homotopies  exploiting  Newton  polytopes 
for  solving  sparse  polynomial  systems,”  SIAM  J.  of  Numerical  Analysis^  31(3),  915-930,  1994. 
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4  Educational  Impact 

One  undergraduate  student,  Brian  Levine,  four  graduate  students  -  Tushar  Saxena,  Qing 
Guo,  N.  Sundaram,  Sreenivasa  Viswanadha,  and  two  post  doctorals  -  Dr.  Xumin  Nie 
and  Dr.  Mohsin  Ahmed,  were  partially  supported  by  this  contract.  The  work  on  Dixon 
resultants  supported  under  this  grant  will  serve  as  a  basis  for  Saxena’s  Ph.D.  disserta¬ 
tion.  A  graduate  level  course  on  reasoning  uses  considerable  material  based  on  research 
in  symbolic  computation  done  for  this  contract.  Another  graduate  level  on  advanced 
computer  vision  is  based  on  results  on  geometric  invariances  and  indexing  techniques  for 
model-based  object  recognition. 

A  number  of  papers  have  been  presented  at  conferences  and  published  in  journals. 
Many  invited  presentations  have  been  given  at  various  conferences,  as  well  as  in  academic 
institutions  and  research  laboratories.  A  paper  by  Rothwell,  Forsyth,  Mundy  and  Zisser- 
man  got  the  Marr  Prize,  the  most  prestigious  award  in  computer  vision.  Forsyth  won  the 
NSF  Young  Investigator  Award.  Kapur  has  been  invited  to  give  a  tutorial  on  algorithmic 
elimination  methods  at  International  Conference  on  Symbolic  and  Algebraic  Computation 
(ISSAC)  to  be  held  in  Montreal  in  July  1995. 

We  participated  in  a  seminar  on  Invariance  in  Object  Recognition  for  DOD  Applica¬ 
tions  at  Wright  Paterson  Lab,  Nov.  18-19,  1992.  On  the  first  day,  Kapur  and  Mundy 
gave  a  tutorial  on  invariant  theory  and  the  use  of  invariants  in  image  understanding.  On 
the  second  day,  Binford  and  Levitt  of  Stanford  University  discussed  quasi-invariants  and 
their  use  in  object  recognition.  The  seminar  was  attended  by  over  50  researchers  from 
industry,  government  laboratories  and  universities.  Interactions  at  the  seminar  led  to 
an  initiative  based  on  thermal  and  geometric  invariants  for  automatic  target  recognition. 
The  discussion  at  the  seminar  about  quasi-invariants  led  us  to  investigate  a  theoretical 
framework  for  quasi-invariants  and  relate  them  to  invariants,  which  resulted  in  [1]. 

We  have  begun  interacting  with  Dr.  Robert  Williams’  group  in  Mission  Avionics 
Technology  Department  at  Naval  Air  Development  Center  at  Warminster,  PA.  We  have 
been  discussing  Dixon  resultants  and  exploring  its  use  for  trajectory  problems.  Their 
group  has  been  using  our  algorithms  based  on  the  Dixon  formulation  in  their  work.^ 


^McShane,  Nakos  and  Williams,  “The  Kapur-Saxena-Yang  Dixon  Resultant  with  Maple  and  Mathematica,” 
presented  at  First  Inti  MACS  Conf  on  Applications  of  Computer  Algebra,  Albuquerque,  New  Mexico,  May 
1995. 
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5  Publications  citing  the  Contract 


Below  we  list  all  the  papers  appearing  in  conferences,  books  and  journals  citing  the  con¬ 
tract.  A  copy  each  of  these  papers  is  included  in  the  Appendix. 
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Abstract 

Invariants  have  recently  been  found  useful  for  object 
recognition  and  indexing  into  model-based  vision  sys¬ 
tems.  For  many  geometric  configurations,  invariants 
may  not  exist.  Quasi-invariants  are  properties  which 
need  not  be  invariants  for  a  whole  set  of  imaging 
transformations,  but  are  invariant  locally.  A  frame¬ 
work  for  studying  quasi-invariants  is  proposed,  and 
is  used  to  study  relationships  between  invariants  and 
quasi-invariants. 

1  Introduction 

There  has  recently  been  considerable  interest  in 
studying  invariants  and  related  properties  of  geomet- 
ric  configurations  for  object  recognition  and  indexing 
into  model-based  vision  systems.  The  main  attrac¬ 
tion  of  geometric  invariants  for  computer  vision  is 
their  independence  of  camera  calibration  and  cam¬ 
era  viewpoint.  Thus,  a  data  base  of  objects  can  be 
efficiently  organized  and  indexed  in  terms  of  invari¬ 
ant  values.  Since  the  invariant  index  does  not  de¬ 
pend  on  viewpoint,  the  object  description  can  be  ob¬ 
tained  from  a  single  view  of  the  object  without  man¬ 
ual  model  construction. 

For  a  discussion  of  recent  developments  in  the  ap¬ 
plication  of  geometric  invariance  in  vision,  a  reader 
may  consult  [4].  Fortunately,  geometric  invariants 
have  been  extensively  studied  by  mathematicians  and 
constructive  algebraicists  at  least  for  last  two  cen¬ 
turies.  Invariants  are  mathematically  associated  with 
a  group  of  transformations,  and  invariants  for  dif¬ 
ferent  groups  of  planar  transformations  -  Euclidean, 
orthographic,  equi-form,  affine,  and  projective,  topo¬ 
logical,  have  been  investigated.  Invariants  for  3D  sur¬ 
face  classes  have  also  been  developed  such  as  surfaces 
of  revolution,  symmetrical  objects  and  algebraic  sur¬ 
faces. 

Perspective  transformations  (central  projection)  do 
not  constitute  a  group.  The  composition  of  two  pla¬ 
nar  perspectivities  is  not  necessarily  a  perspectiv- 
ity.  On  the  other  hand,  the  theory  of  invariants  is 
most  fully  developed  for  the  case  of  group  transfor¬ 
mations.  Perspective  transformations  are  a  subset 
of  projective  transformations  which  do  constitute  a 
group.  Therefore  projective  invariants  can  be  used 

*  Supported  in  part  by  a  grant  from  United  States  Air  Force 
Office  of  Scientific  Research  AFOSR-91-0361. 


to  index  a  model  database  for  recognition  under  per¬ 
spective  viewing.  However,  geometric  configurations 
must  be  of  sufficient  complexity  to  support  projective 
invariants.  For  example,  projective  invariants  cannot 
be  derived  for  3  collinear  points,  4  coplanar  points, 
etc. 

Quasi-invariants  have  been  proposed  by  Binford  [3]  as 
an  alternative  to  the  strict  requirements  of  invariants. 
The  idea  here  is  to  identify  properties  which  need  not 
be  invariant  for  the  whole  set  of  perspective  transfor¬ 
mations,  but  are  invariant  locally  and  remain  reason¬ 
ably  constant  over  the  entire  view  space.  The  advan¬ 
tage  is  that  many  more  quasi-invariants  are  available 
for  a  given  set  of  geometric  features  than  projective 
invariants. 

A  goal  of  this  work  is  to  understand  the  relationship 
between  quasi-invariants  and  invariants  and  to  see 
whether  invariants  and  quasi-invariants  can  be  un¬ 
derstood  in  a  single  framework.  This  will  enable  us 
to  use  invariants  whenever  they  exist  and  are  easy 
to  compute,  and  augment  them  by  quasi-invariants 
otherwise. 

Given  that  quasi-invariants  vary  over  imaging  trans¬ 
formations,  one  obvious  question  is  to  analyze  their 
variability  over  a  practical  range  of  view  points.  We 
are  interested  in  studying  the  variability  of  quasi¬ 
invariants  over  group  actions  and  classify  quasi¬ 
invariants  based  on  their  group  categories.  While 
this  is  not  a  general  way  to  analyze  quasi-invariants 
we  will  demonstrate  that  many  useful  insights  can  be 
gained  from  the  group  invariant  viewpoint. 

2  Definitions 
2.1  Invariants 

The  most  well-known  invariant  for  projective  trans¬ 
formation  is  that  of  the  cross-ratio  for  4  collinear 
points  (which  is  the  ratio  of  length  ratios).  Five 
coplanar  points  which  is  a  generalization  of  4  collinear 
points  to  2  dimensions,  also  have  two  projective  in¬ 
variants.  Three  points  define  a  plane,  so  the  ratio  of 
area  ratios  of  various  triangles  defined  by  5  points, 

Area(PiyP2yP4) 

Area{PiyP3yp4) 

AreajPiyP^yPs)  ’ 

Area{Pi  yP^yPs) 

is  a  projective  invariant.  A  second,  independent  ratio 
can  be  constructed  from  another  permutation  of  point 


grouping.  The  concept  of  an  invariant  can  be  defined 
more  generally.  , 

Given  a  group  G  of  transformations,  a  property,  say 
/,  expressed  in  terms  of  parameters  P  of  a  geometric 
configuration,  is  said  to  be  invariant  with  respect  to 
Gif 

7(P)  =  /(T)  /(T(P)).foranyT€G, 

where  T  is  a  transformation  on  a  geometric  configu¬ 
ration,  T{P)  are  the  parameters  of  the  transformed 
geometric  configuration  under  T,  and  /(T)  is  a  func¬ 
tion  on  the  transformation  parameters.  For  absolute 
invariants,  f{T)  can  be  made  1. 

2.2  Q  uasi-invariant  s 

Four  coplanar  points  do  not  have  any  projective  in¬ 
variant  since  any  set  of  four  points  can  be  mapped 
onto  any  other  set  of  four  points  by  a  projective  trans¬ 
formation,  It  is  however  possible  to  associate  quasi¬ 
invariants  for  4  coplanar  points  as  discussed  by  Bin- 
ford  and  Levitt  [3].  For  example,  the  ratio  of  two 
areas  is  a  quasi-invariant  under  the  projective  trans¬ 
formation  between  two  planes,  i.e, 

Area{Pi,P2,PA) 

Area{Pi,  P2,  P3) 

A  quasi-invariant  can  be  defined  more  generally  as 
follows. 

Like  an  invariant,  a  quasi-invariant,  QI  is  a  property 
expressed  in  terms  of  parameters  P  of  a  geometric 
configuration.  A  property,  QI,  is  locally  invariant 
with  respect  to  S,  a,  set  of  transformations,  at  some 
distinguished  element  5  in  5,  if  for  each  transforma¬ 
tion  parameter  U,  the  first  derivative  of  QI  with  re¬ 
spect  to  ti  is  0  at  s.  The  set  S  need  not  be  a  group, 
but  all  known  quasi-invariants  are  defined  with  re¬ 
spect  to  group  invariants. 

A  quasi-invariant  is  thus  defined  in  terms  of  three 
elements, 

•  a  geometric  configuration  and  associated  prop¬ 
erty  of  the  configuration, 

•  a  distinguished  set  of  image  transformations, 

•  a  specific  transformation  in  the  set  of  image 
transformations. 

In  principle,  there  can  be  configurations  and  prop¬ 
erties  which  are  quasi-invariant  at  different  points  in 
the  transformation  group.  However,  the  known  quasi¬ 
invariants  are  locally  invariant  when  the  image  plane 
is  parallel  to  the  geometric  configuration  and  the  im¬ 
age  view  point  is  at  infinity. 

A  quasi-invariant  QI  is  called  strong  at  a  distin¬ 
guished  element  s  in  5  if  for  each  transformation  pa¬ 
rameter  ti,  in  addition  to  the  first  derivatives  being 


0,  the  second  derivatives  of  QI  with  respect  to  ti  also 
vanish  at  s. 

It  is  possible  to  further  refine  quasi-invariants  by  re¬ 
quiring  that  their  first  k  derivatives  with  respect  to 
transformation  parameters  be  0  at  a  distinguished  el¬ 
ement,  for  >  2. 

As  a  simple  example,  consider  a  point  in  3-D;  the  co¬ 
ordinates  of  an  image  of  this  point  under  perspective 
viewing  can  be  written  as: 

/  X  cosO 
z  —  X  sind  ’ 

where  is  the  depth  and  9  is  rotation  around  y-axis. 
If  the  ratio  of  distances  of  two  points  equi- distant 
(ir)  from  the  origin  is  computed,  an  affine  invariant 

z  —  r  sin6 
z  +  r  sin9 

is  obtained.  The  derivatives  of  the  above  expression 
with  respect  to  z  and  6  both  vanish  at  z  =  00  pro¬ 
vided  r  is  finite  (or  r  «  z).  As  the  reader  can  easily 
verify,  the  above  expression  is  not  locally  invariant 
for  values  of  z  which  are  comparable  to  r.  This  is 
an  example  of  a  quasi-invariant  that  is  locally  invari¬ 
ant  at  ^  =  0  or  z  =  00,  but  not  at  other  values  of 
z.  In  particular,  the  geometry  of  the  object  class  (in 
the  above  example,  Ps  value  in  comparison  to  z)  may 
play  a  role. 

3  A  Group  Framework  for  Quasi- 
Invariants 

The  above  discussion  suggests  an  alternate  definition 
of  quasi-invariants  which  can  perhaps  be  useful  for 
studying  variability  in  quasi-invariants  with  respect 
to  transformation  parameters.  Admittedly,  this  defi¬ 
nition  may  rule  out  some  quasi-invariants  developed 
using  Binford  and  Levitt’s  definition  since  there  can 
be  functions  which  have  vanishing  derivatives  at  some 
point  in  the  group,  but  are  not  invariants  of  any  sub¬ 
group. 

All  the  examples  of  quasi-invariants  discussed  in  Bin- 
ford  and  Levitt  [3]  can  be  studied  using  a  framework 
based  on  group  invariants.  First  we  observe  that  an 
invariant  does  not  depend  upon  any  transformation 
parameter,  its  first  derivative  with  respect  to  any 
transformation  parameter  is  0.  And,  this  is  the  case 
for  every  element  in  G. 

Theorem:  Given  a  group  G  of  transformations,  an 
invariant  I  is  a  (strong)  quasi- invariant  for  G  at  every 
element  in  G, 

3.1  Group  Quasi-invariants 

A  key  relationship  between  invariants  and  quasi¬ 
invariants  is  illustrated  by  the  following  observation. 

Assume  a  group  G  of  transformations  which  can  be 
parameterized  using  a  set  T  of  parameters  as  well 
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as  its  various  subgroups  can  be  parameterized  using 
subsets  of  T.  Given  two  nontrivial  subgroups  Gj 
of  G  with  transformation  parameters  Ti^Tj^  such  that  • 
Gi  C  Gj ,  C  Tj ,  a  property  QI  is  quasi-invariant 
at  a  distinguished  point  g  6  Gj  for  Gj  if  it  is  an 
invariant  for  Gi  and  for  every  additional  parameter 
tj  ^Tj—Ti,  the  first  derivative  of  QI  with  respect  to 
tj  is  0  at  Invariants  defined  in  this  way  are  called 
group  quasi-invariants. 

For  perspective  viewing  there  is  a  natural  chain  of 
group  transformations,  equiform  C  affine  C  projec¬ 
tive,  where  each  is  more  general  (more  group  param¬ 
eters)  than  the  previous.  In  general  there  will  be  a 
hierarchy  of  transformations  where 

GiCG2...CGn. 

4  Quantifying  Quasi-invariant  Varia¬ 
tion 

4.1  Variation  Based  on  Derivatives 

We  can  provide  a  basis  for  comparing  quasi-invariant 
variation  over  the  transformation  group  as  follows. 

Given  two  nontrivial  subgroups  Gi,Gj  of  G,  and  a 
distinguished  element  gr  G  Gj,  a  quasi-invariant  QI 
varies  at  least  as  much  as,  up  to  order,  QV  if  for 
every  k^  <  k,  the  absolute  value  of  every  k'^^  deriva¬ 
tive  QI  with  respect  to  transformation  parameters 
Tj  -Ti  is  >  the  absolute  value  of  the  corresponding 
kfih  derivative  of  QI'.  We  also  say  that  QI'  varies  no 
more  than  QI,  up  to  k^^  order. 

Theorem:  Given  a  group  G  of  transformations  and 
Gj  C  G,  for  any  k,  an  invariant  of  Gj  varies  no  more 
than  any  quasi-invariant  QI'  of  Gj,  up  to  k^^  order, 
at  any  element  of  Gj . 

Theorem:  Given  a  group  G  of  transformations  and 
Gj  C  G,  a  strong  quasi-invariant  of  Gj  varies  no  more 
than  any  other  quasi- invariant  QI'  of  Gj  up  to  2nd 
order  at  distinguished  element  g  £  Gj. 

It  is  also  possible  to  compare  quasi-invariants  asso¬ 
ciated  with  two  different  pairs  of  subgroups  for  their 
applicability  and  variability.  Given  quasi-invariants 
QI  with  an  associated  pair  of  subgroups  (G*,  Gj)  and 
another  QI'  with  an  associated  pair  of  subgroups 
(G'-,Gj)  such  that  Gi  is  a  subgroup  of  G^,  and  G'- 
is  a  subgroup  of  Gj,  QI  varies  at  least  as  much  as 
QI'  at  a  distinguished  element  g  G  G'-  if  the  abso¬ 
lute  value  of  every  k^^  derivative  of  QI  with  respect 
to  parameters  in  Tj  -  Ti  is  >  the  absolute  value  of 
the  corresponding  k^^  derivative  of  QI'.  Typically 
the  larger  subgroups  Gj ,  Gj  are  the  same,  and  they 
are  G  itself,  which  is  usually  the  group  of  projective 
transformations. 

4.2  The  Group  Hierarchy  Bound 

For  a  given  geometric  configuration,  appropriate 
quasi-invariants  can  be  compared  over  the  entire 


space  of  image  viewpoints.  If  a  quasi-invariant  is  an 
invariant  of  a  more  restricted  group,  its  variation  is 
'likely  to  be  .larger  over  full  set  of  image  transforma¬ 
tions  than  a  quasi-invariant  based  on  a  more  gen¬ 
eral  group.  For  example,  the  angle  between  two  lines 
varies  more  than  the  ratio  of  two  lengths  on  a  line 
which  in  turn  varies  more  than  the  cross-ratio  since 
these  are  invariants  of  a  properly  contained  group  hi¬ 
erarchy  with  respect  to  perspective. 


Consider  a  configuration  consisting  of  four  equidis¬ 
tance  collinear  points  (Pi,  P2y  Ps^  Pa)  with  another 
point,  say  P5,  not  on  the  line  but  is  equidistant 
from  P2  and  P3.  It  can  be  shown  that  cross-ratio 

is  invariant  with  respect  to  perspective,  but 
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the  collinear  length  ratio,  say  ,  varies.  However 
this  length  ratio  varies  less  than  the  distance  ratio 
jp^p^l  or  jp^p^l .  The  variability  depends  upon  (i)  the 
distance  between  the  camera  and  geometric  configu¬ 
ration,  and  (ii)  the  coordinates  of  P2  and  P^. 


We  believe  that  it  should  be  possible  to  perform  a 
global  comparison  of  appropriate  quasi-invariants  by 
comparing  their  associated  subgroups  for  which  they 
are  invariants.  Since  group  invariants  constitute  an 
algebra  (i.e.  any  rational  function  of  invariants  is  also 
an  invariant),  a  canonical  form  of  group  invariants 
may  be  needed  in  terms  of  some  basic  invariants  so 
that  basic  invariants  can  be  compared  for  variability. 
This  aspect  of  variability  of  quasi-invariants  is  being 
currently  investigated. 


References 

[1]  S.A.  Abhyankar,  ‘Invariant  theory  and  enumer- 
ative  combinatorics  of  Young  tableaux,”  in  Ge¬ 
ometric  Invariance  in  Computer  Vision,  (eds. 
Mundy  and  Zisserman),  MIT  Press,  1992,  45-76. 

[2]  E.B.  Barrett,  P.M.  Payton,  N.N.  Haag,  and  M.H. 
Brill,  “General  methods  for  determining  projec¬ 
tive  invariants  in  imagery,”  CVGIP:  Image  Un¬ 
derstanding,  53,  1991,  46-65. 

[3]  T.  Binford  and  T.S.  Levitt,  “Quasi-Invariants: 
Theory  and  exploitation,  ”  Proc.  DARPA  work¬ 
shop  on  Image  Understanding,  Washington, 
D.C.,  1993. 

[4]  J.L.  Mundy  and  A.  Zisserman  (eds),  Geomet¬ 
ric  Invariance  in  Computer  Vision,  MIT  Press, 
1992. 

[5]  J.L.  Mundy,  P.M.  Payton,  M.H.  Brill,  E.B.  Bar¬ 
rett,  and  R.P.  Welty,  “3-D  Model  alignment 
without  computing  pose,”  Proc.  DARPA  work¬ 
shop  on  Image  Understanding,  1992,  727-735. 


3 


Recognizing  Algebraic  Surfaces  from  their  Outlines 

D.  A.  Forsyth 
Computer  Science  Division, 

University  of  California  at  Berkeley, 

Berkeley, 

CA  94720 

January  5,  1995 


Abstract 

The  outline  in  a  single  picture  of  a  generic  algebraic  surface  of  degree  three  or  greater 
completely  determines  the  projective  geometry  of  the  surface.  The  result  holds  for  a 
generic  perspective  view  of  a  generic  algebraic  surface,  where  the  camera  calibration 
parameters  and  the  focal  point  are  unknown.  Known  camera  calibration  appears  not  to 
reduce  the  projective  ambiguity.  The  result  is  constructive.  Keywords:  Recognition, 
Computer  Vision,  Algebraic  Surfaces,  Invariant  Theory,  Outlines. 
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1  Introduction 


Outlines,  the  points  in  an  image  where  a  surface  turns  away  from  the  camera,  are  a  po¬ 
tentially  important  source  of  information  about  the  objects  in  a  scene.  Typically,  image 
edges  appear  at  most  outline  points,  and  image  edges  can  be  computed  reasonably  reliably. 
This  potential  has  not  been  realised  in  the  case  of  curved  surfaces,  because  the  complicated 
relationship  between  the  outline  of  a  curved  surface  and  the  surface  makes  outlines  hard 
to  interpret.  This  paper  shows  that,  although  the  relationship  between  surface  and  outline 
is  complicated,  for  a  large  class  of  surfaces  the  outline  is  sufficiently  highly  structured  to 
determine  the  surface’s  projective  geometry  from  a  single  view. 

1.1  Recognising  curved  surfaces 

There  have  been  many  approaches  to  recovering  shape  information  for  curved  surfaces  from 
images,  including  attempts  to  extend  line  labelling  to  curved  shapes  (e.g.  [10,  20]),  the 
development  of  constraint-based  systems  (e.g.  [3]),  the  study  of  how  the  topology  of  a 
surface’s  outline  changes  as  it  is  viewed  from  different  points,  formalised  into  a  structure 
known  as  an  aspect  graph  (for  example,  [16,  17,  26,  33,  34]),  and  studies  of  the  relationship 
between  the  differential  geometry  of  the  outline  and  that  of  the  surface,  both  for  single 
images  (e.g.  [17,  18,  21])  and  for  motion  sequences  (e.g.  [4,  11]). 

Each  approach  has  characteristic  disadvantages;  extensions  to  line  labelling  and  aspect 
graphs  can  be  extremely  complicated  for  even  simple  curved  surfaces  (examples  in  [27, 
33,  34]),  constraint-based  systems  must  search  a  model-base,  and  studies  of  differential 
properties  seldom  yield  sufficient  information  to  identify  a  surface.  Recently,  there  have  been 
attempts  to  represent  the  system  of  outlines  of  a  curved  surface  as  a  linear  combination  of 
outlines  (see,  for  example,  [42]).  This  approach  is  represented  as  providing  an  approximation 
sufficiently  accurate  for  some  purposes,  although  it  cannot  capture  all  the  complexities  of 
the  outline.  There  are  two  main  difficulties:  it  is  hard  to  specify  correspondences  between 
outline  points,  so  that  the  linear  combination  is  ill-defined;  and,  because  the  scheme  is 
based  on  a  local  approximation,  it  cannot  capture  the  global  interactions  between  a  surface 
and  a  focal  point  that  produce  the  outline.  However,  the  approximation  can  yield  plausible 
outlines  when  views  are  taken  from  similar  viewing  positions. 

Recovering  surface  geometry  from  a  single  outline  is  intractable  if  the  surface  is  con¬ 
strained  only  to  be  smooth  or  piecewise  smooth,  because  significant  changes  can  be  made  to 
the  surface  geometry  without  affecting  the  outline  from  a  given  viewpoint.  As  a  result,  an 
important  part  of  the  problem  involves  constructing  as  large  a  class  of  surfaces  as  possible 
that  can  either  be  directly  recognised  or  usefully  constrained,  from  their  outline  alone.  In 
this  context,  studies  have  focused  on  rotationaJly  symmetric  surfaces  (for  example,  [6,  8]), 
and  straight  homogenous  generalised  cylinders  (for  example,  [2,  28,  43,  44,  47]). 

Recently,  Ponce  and  Kriegman  [29,  30,  31,  32,  33]  focussed  attention  on  algebraic  sur¬ 
faces.  Algebraic  surfaces,  which  consist  of  aU  the  points  in  space  where  a  single  polynomial 
vanishes,  have  numerous  advantages  as  objects  of  study: 

•  Many  man-made  surfaces  are  made  up  of  “patches”  of  algebraic  surface,  as  most 
popular  CAD/CAM  surfaces  are  algebraic. 

•  The  geometry  of  an  algebraic  surface  is  determined  by  a  relatively  small  number  of 
parameters  (the  coefficients  of  the  polynomial  that  gives  the  surface).  At  the  same 


1 


time,  the  surfaces  have  a  rich  and  useful  geometry. 

•  Algebraic  surfaces  have  important  “rigidity”  properties.  For  example,  one  cannot  add 
a  local  bump  to  an  algebraic  surface  and  obtain  another  algebraic  surface;  the  whole 
surface  must  be  deformed  instead. 

Ponce  and  Kriegman  showed  that  elimination  theory  can  be  used  to  predict  the  outline 
of  an  algebraic  surface  viewed  from  an  arbitrary  viewing  position.  For  a  given  surface  a 
viewing  position  is  then  chosen  using  an  iterative  technique,  to  give  a  curve  most  like  the 
curve  observed.  The  object  is  then  recognized  by  searching  a  database,  and  selecting  the 
member  giving  the  best  fit  to  the  observed  outline.  This  work  shows  that  outline  curves 
strongly  constrain  the  viewed  surface,  but  has  the  disadvantage  that  it  cannot  recover 
surface  parameters  without  solving  an  optimization  problem,  so  that  for  a  big  model  base, 
each  model  may  have  to  be  tested  in  turn  against  the  image  outline.  Furthermore,  camera 
parameters  must  be  known  to  predict  the  outline  correctly. 

1.2  Indexing  for  recognition 

A  number  of  recent  papers  have  shown  how  indexing  can  be  used  to  avoid  searching  a 
model  base  (e.g.  [7,  19,  35,  39,  45]).  Objects  are  indexed  by  computing  descriptions  that 
are  unaffected  by  the  position  and  intrinsic  parameters  of  the  camera,  and  that  differ  from 
object  to  object.  These  descriptions,  often  known  as  indexing  functions,  have  the  same 
value  for  any  view  of  a  given  object,  and  so  can  be  used  to  index  into  a  model  base  without 
search. 

In  a  typical  system  that  works  for  plane  objects,  projective  invariants^  are  computed 
for  a  range  of  geometric  primitives  in  the  image.  If  the  values  of  these  invariants  match  the 
values  of  the  invariants  for  a  known  model,  we  have  good  evidence  that  the  image  features 
are  within  a  camera  transformation  of  the  model  features.  As  a  result,  these  invariants 
index  into  a  model  base  directly.  Object  models  consist  of  a  system  of  invariant  values  and 
are  therefore  relatively  sparse,  meaning  that  hypothesis  verification  is  required  to  confirm  a 
model  match.  However,  no  searching  of  the  model  base  is  required  because  the  hypothesised 
object’s  identity  is  determined  by  the  invariant  descriptors  measured.  Systems  of  this  sort 
have  been  demonstrated  for  plane  objects  in  a  number  of  papers  [7,  19,  36,  40,  46,  48]. 
These  systems  are  attractive  because,  in  the  ideal  case,  an  object  description  is  computed 
from  the  image  and  identifies  the  object,  without  requiring  that  a  model  base  be  searched. 
As  a  result,  systems  with  relatively  large  model  bases  can  be  constructed^. 

In  the  case  of  plane  objects,  indexing  functions  are  easy  to  compute,  because  a  view 
of  a  plane  curve  from  an  arbitrary  focal  point  is  within  a  projective  transformation  of  the 
original  curve.  Constructing  indexing  functions  for  three  dimensional  objects  is  challenging, 
because  a  change  in  viewing  position  can  lead  to  a  profound  change  in  the  geometry  of  the 
outline.  Furthermore,  any  indexing  function  should  be  both  invariant  and  computable  from 
outline  information  alone.  Indexing  functions  with  these  properties  have  been  demonstrated 
for  polyhedra  [37],  and  for  rotationally  symmetric  surfaces  [8]. 

This  paper  shows  that  such  indexing  functions  can  be  computed  for  algebraic  surfaces 
viewed  in  perspective  using  an  uncalibrated  camera,  by  establishing: 

*A  clear  introduction  to  applying  invariant  theory  in  computer  vision  appears  in  [22]. 

^Current  systems  using  indexing  functions  have  model-bases  containing  of  the  order  of  thirty  objects. 
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Theorem:  The  equation  of  its  outline  in  a  perspective  camera  completely  de¬ 
termines  the  projective  geometry  of  an  algebraic  surface  of  degree  2  or  greater, 
for  a  generic  view  of  a  generic  algebraic  surface. 

Here  generic  means  “almost  every”;  precisely,  the  generic  algebraic  surfaces  are  aU  al¬ 
gebraic  surfaces  except  those  whose  coefficients  satisfy  a  non-trivial  system  of  algebraic 
relations,  to  be  determined  later.  At  this  point,  we  assume  that  the  surface  is  smooth  and 
irreducible.  The  result  is  similar  to  that  independently  obtained  by  [5],  who  demonstrated 
necessary  and  sufficient  conditions  for  a  curve  to  be  an  outline,  but  did  not  show  that  an 
outline  determines  a  surface.  Note  that  the  theorem  is  trival  for  surfaces  of  degree  two,  as 
generic  surfaces  of  degree  2  are  aU  projectively  equivalent.  The  main  result  is  given  by  the 
following  two  properties  of  outlines: 

•  The  contour  generator  of  a  generic  algebraic  surface  is  determined  as  a  space  curve 
uniquely  (up  to  a  projectivity  of  space),  by  a  generic  projection  of  the  curve  on  to 
a  plane.  In  particular,  the  outline  of  an  algebraic  surface  in  a  given  view  contains 
sufficient  information  to  compute  both  the  contour  generator  of  that  surface  and 
the  focal  point  through  which  the  contour  generator  was  formed,  together  in  some 
arbitrary  projective  frame. 

•  Given  the  contour  generator  of  a  generic  surface  viewed  from  a  generic  focal  point, 
and  that  focal  point,  the  surface  can  be  uniquely  determined. 

2  The  outline  of  an  algebraic  surface 

Throughout  the  paper,  we  assume  an  idealised  pinhole  camera.  These  cameras  possess  a 
focal  point  and  an  image  plane.  Points  in  space  appear  in  the  image  as  the  intersection 
between  the  image  plane  and  a  line  through  the  focal  point  and  the  point  in  space.  An 
orthographic  view  occurs  when  the  focal  point  is  “at  infinity”. 

It  is  easy  to  see  that  if  the  focal  point  is  fixed  and  the  image  plane  is  moved,  the  resulting 
distortion  of  the  image  is  a  collineation,,  a  one-to-one  map  of  the  projective  plane  to  itself 
that  takes  straight  lines  to  straight  lines.  In  what  follows,  it  is  assumed  that  neither  the 
position  of  the  image  plane  with  respect  to  the  focal  point  nor  the  size  and  aspect  ratio  of 
the  pixels  on  the  camera  plane  is  known,  so  that  the  image  presented  to  the  algorithm  is 
within  some  arbitrary  coUineation  of  the  “correct”  image.  In  this  model,  the  image  plane 
makes  no  contribution  to  the  geometry,  and  its  position  in  space  is  ignored. 

The  outline  of  a  surface  is  a  plane  curve  in  the  image,  which  itself  is  the  projection  of  a 
space  curve,  known  as  a  contour  generator^ .  The  contour  generator  is  given  by  those  points 
on  the  surface  where  the  surface  turns  away  from  the  image  plane;  formally,  the  ray  through 
the  focal  point  to  the  surface  is  tangent  to  the  surface.  As  a  result,  at  an  outline  point, 
if  the  relevant  surface  patch  is  visible,  nearby  pixels  in  the  image  will  see  vastly  different 
points  on  the  surface,  and  so  outline  points  usually  have  sharp  changes  in  image  brightness 
associated  with  them.  Figure  1  illustrates  these  concepts. 

We  shall  study  generic  algebraic  surfaces  in  projective  space,  otherwise  written  as  A 
point  in  is  given  by  four  homogenous  coordinates,  where  two  sets  of  homogenous  coordi¬ 
nates  refer  to  the  same  point  if  they  are  within  a  scalar  multiple  of  one  another  (Appendix 

^ There  are  a  number  of  widely  used  terms  for  both  curves,  and  no  standard  terminology  has  yet  emerged. 
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focal  point 


Figure  1:  The  outline  and  contour  generator  of  a  curved  object,  viewed  from  a  perspective 
camera. 
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1  in  [23]  contains  examples  and  discussions  of  the  practical  applications  of  homogenous 
coordinates).  Projective  three-space  is  similar  to  the  three  dimensional  space  in  which  we 
live,  but  contains  provision  for  a  plane  of  infinitely  distant  points  as  well.  In  P^,  an  al¬ 
gebraic  surface  is  given  by  the  vanishing  of  a  single  homogenous  polynomial  in  these  four 
coordinates.  We  will  assume  that  the  camera  has  an  infinite  film  plane  as  well,  so  that  the 
image  plane  can  be  modelled  by  P^,  the  projective  plane.  The  points  in  space  that  project 
to  points  “at  infinity”  in  the  camera  film  plane  lie  on  a  plane  parallel  to  the  image  plane 
and  passing  through  the  focal  point. 

We  use  the  following  notation: 

•  (wo)  Ui,U2,  U3)  are  the  coordinates  of  a  point  in  P^. 

•  {xo,xi,X2)  are  the  coordinates  of  a  point  in  P^,  the  image  plane. 

•  (/oj  /i)  /z?  fs)  is  the  camera  focal  point. 

•  5(uo,  ui,  U2,  U3)  is  the  homogenous  polynomial  that  vanishes  on  the  surface. 

•  d  is  the  degree  of  S. 


Since  the  surface  is  generic,  it  is  irreducible,  and  so  S{uo,  ui,U2,  W3)  does  not  factor.  The 
contour  generator  lies  on  the  surface,  and  so  S{uq,  u\,U2,  U3)  =  0  on  the  contour  generator. 
The  plane  tangent  to  the  surface  at  a  point  on  the  contour  generator  must  pass  through 
the  focal  point,  by  the  definition  of  the  contour  generator.  As  a  result,  the  expression 


ds  ds  ds  ds 


vanishes  on  the  contour  generator.  This  expression  will  be  called  T  for  short  in  what  follows. 
We  have  immediately  that: 

•  The  contour  generator  is  an  algebraic  space  curve  given  by  the  vanishing  of  just  two 
polynomials^,  S  and  T,  and  so  is  a  complete  intersection.  Furthermore,  for  a  generic 
choice  of  the  focal  point,  T  has  degree  d  -  1,  and  so  the  contour  generator  has  degree 
d(d  -  1).  The  surface  given  by  T  =  0  is  known  as  the  first  polar  of  S. 

•  The  family  of  contour  generators  on  a  surface  is  a  family  of  curves  linearly  parametrised 
by  focal  points  alone.  Such  families  are  known  to  algebraic  geometers  as  linear  systems, 
and  are  widely  studied.  The  study  of  such  systems  makes  general  statements  about 
contour  generators  possible.  For  example,  a  generic  contour  generator  on  a  generic 
surface  is  smooth,  by  Bertini’s  theorem®.  In  fact,  the  contour  generator  is  a  plane 
section  of  the  dual  of  the  surface,  where  the  sectioning  plane  depends  on  the  focal  point 
chosen.  The  simplicity  of  the  system  of  contour  generators  stands  in  stark  contrast 
to  the  complexity  displayed  by  the  family  of  outlines,  as  the  focal  point  changes, 
information  conventionally  captured  by  an  aspect  graph. 

^Some  algebraic  curves  in  space  must  be  given  by  more  than  two  equations  -  see,  for  example  [24] 

generic  element  of  a  linear  system  is  smooth  away  from  its  base  points  (e.g.  [13]);  note  that  for 
a  smooth  surface,  the  system  of  contour  generators  has  no  b2ise  points,  but  if  the  surface  is  singular,  the 
singularities  of  the  surface  are  base  points,  and  the  contour  generator  may  be  singular. 
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•  Contour  generators  are  projectively  covariant;  that  is,  for  a  surface  S,  viewed  from 
a  focal  point  /,  with  contour  generator  C,  if  P  is  an  arbitrary  projectivity  of  space, 
then  P{C)  is  the  contour  generator  of  P(5)  viewed  from  P(/).  This  is  because  the 
contour  generator  is  defined  by  tangency  and  incidence  conditions  alone. 

It  is  important  to  note  that  the  treatment  that  follows  assumes  that  the  complex  points 
of  both  the  algebraic  curve  and  the  algebraic  surface  are  meaningful.  For  example,  when  a 
count  is  given  of  the  number  of  singular  points  on  the  outline  of  a  given  type,  that  count 
includes  the  complex  singularities.  It  is  conceivable  that  an  algebraic  surface  could  have 
an  outline  that  consisted  entirely  of  complex  points;  a  natural  example  is  the  outline  of  a 
sphere  viewed  from  a  focal  point  lying  inside  the  sphere.  In  this  case,  while  in  principle 
the  outline  constrains  the  surface  just  as  effectively  as  if  it  had  a  large  collection  of  real 
points,  in  practice  the  outline  is  difficult  to  observe.  Another  effect  that  can  make  the 
outline  difficult  to  observe  is  self-occlusion,  where  sections  of  the  outline  a^e  occluded  by 
the  surface  and  so  are,  in  practice,  invisible.  Self  occlusion  is  difficult  to  study  given  the 
methods  here;  visibility  can  change  only  at  singularities,  however. 

This  is  why  the  statement  of  the  theorem  emphasizes  the  equation  of  the  outline.  In 
practice,  if  the  outline  has  sufficient  visible  real  points  that  a  fitting  process  can  determine 
its  equation  from  the  real  points  alone,  its  complex  points  and  singularities  follow.  In 
principle,  a  fitting  process  should  be  robust  to  occlusions,  as  for  irreducible  algebraic  curves 
(the  genericity  assumptions  assure  that  the  outlines  covered  in  this  paper  are  irreducible), 
only  a  finite  number  of  points  is  necessary  to  determine  the  equation  of  the  curve.  This 
means  that,  to  apply  the  result,  we  must  assume  that  the  view  yields  enough  real  points 
on  the  outline  to  determine  its  equation;  this  is  not  a  particularly  strong  restriction  in 
principle. 

2.1  The  singularities  of  the  outline 

Since  the  contour  generator  of  a  non-singular  algebraic  surface  viewed  through  a  generic 
focal  point  is  smooth,  the  singularities  of  the  outline  must  be  a  result  of  the  projection 
from  the  contour  generator  to  the  outline.  These  singularities  are  the  key  to  obtaining  the 
contour  generator  from  the  outline;  fortunately,  they  are  highly  structured.  Generically 
there  are  only  cusps  and  nodes.  The  following  results  have  been  known  since  at  least  the 
late  19th  century  (see,  for  example,  [1]). 

2.1.1  Cusps 

A  cusp  in  the  outline  is  a  local  event  on  the  contour  generator,  so  that  cusps  are  relatively 
easily  studied;  a  cusp  occurs  when  the  contour  generator  is  tangent  to  the  ray  through  the 
focal  point  (see,  for  example,  [16]  for  this  widely  known  result). 

Lemma  1:  Cusps  in  the  outline  are  the  projections  of  points  on  the  contour  generator 
where  the  second  polar  of  the  surface  through  the  focal  point  vanishes;  accordingly,  there  are 
d{d-  l)(d-  2)  cusps  in  the  outline  of  a  surface  of  degree  d. 

Proof:  If  p  is  a  point  on  the  contour  generator  that  projects  to  a  cusp  on  the 
outline,  and  /  is  the  focal  point,  then  the  line  pf  is  tangent  to  the  contour 
generator  at  p.  The  Mne  tangent  to  the  contour  generator  at  p  is  given  by  the 
intersection  of  the  plane  tangent  to  the  surface  at  p  and  the  plane  tangent  to 
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the  first  polar  at  p.  Because  this  line  passes  through  /,  the  plane  tangent  to  the 
first  polar  at  p  must  pass  through  /  as  well.  Recall  that  the  surface  was  written 
as  S  and  the  first  polar  was  written  as  T.  We  have  that  the  expression: 


f  DT  9t  dT  er  _ 


must  vanish  at  p.  This  expression,  which  is  the  first  polar  of  T  through  /,  is 
also  known  as  the  second  polar  of  S  through  /,  and  has  degree  d  —  2  if  d  is  the 
degree  of  5;  call  this  expression  P  for  convenience. 

In  turn,  if  S',  T  and  P  vanish  at  a  point  p,  then  the  point  is  on  the  surface  and 
on  the  contour  generator  by  definition;  furthermore,  the  plane  tangent  to  the 
surface  at  p  passes  through  /,  and  the  plane  tangent  to  T  at  p  passes  through  /, 
so  their  intersection,  which  is  tangent  to  the  contour  generator,  passes  through 
/.  Thus,  the  contour  generator  cusps  at  exactly  those  points  where  S,  T  and  P 
vanish;  by  Bezout’s  theorem  there  are  d(d  —  l)(d  —  2)  such  points,  and  so  the 
outline  has  d(d  —  l)(d  —  2)  cusps.  □ 


2.1.2  Double  points 

Double  points  (nodes)  on  the  outline  occur  when  a  line  through  the  focal  point  is  tangent 
to  5  in  two  distinct  points,  and  so  are  global  events;  determining  the  number  of  double 
points  on  the  outline  requires  more  complex  reasoning. 

Lemma  2:  There  are  id(d-l)(d-2)(d-3)  double  points  on  the  outline  of  an  algebraic 
surface  of  degree  d. 

Proof:  The  contour  generator  is  a  complete  intersection,  and  so  its  genus  is 
given  by  the  formula 

-d(d-l)(2d-5)  +  l 
2 

where  d  is  the  degree  of  the  surface  (cf  [14],  p.  188,  ex  8.4g).  Project  the  contour 
generator  into  the  image  through  the  focal  point;  the  resulting  curve  is  birational 
to  the  contour  generator,  and  so  has  the  same  genus.  The  singularities  are  stable 
(by  the  generic  choice  of  surface  and  focal  point),  so  the  genus-degree  formula 
for  plane  curves  yields  that 

id(d  -  l)(2d  -  5)  +  1  =  (d(d  -  1)  -  l)(d(d  -  1)  -  2)  -  (n^  +  nj) 

where  Uc  is  the  number  of  cusps  and  is  the  number  of  double  points.  Rear¬ 
ranging  the  formula  and  substituting  the  above  result  on  the  number  of  cusps 
yields  ltd  =  \d{d  —  l){d  -  2){d  —  3).  □ 

2.2  Global  properties  of  the  singularities  of  the  outline 

A  property  of  the  outline  that  will  prove  important  later  is  that  its  singularities  lie  on  the 
intersection  of  two  plane  curves,  whose  degree  (which  is  relatively  low  for  the  number  of 
points)  can  be  determined  using  elimination  theory.  These  curves  can  be  studied,  without 
loss  of  generality,  by  assuming  that  the  focal  point  is  the  point  (0, 0,0,1).  This  makes 
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computing  the  outline  relatively  simple;  a  point  {uo,ui,U2,'ti3)  projects  through  (0,0, 0,1) 
to  («o,«i,W2),  because  if  («o,«i,W2)  are  fixed  and  the  fourth  coordinate  varies,  the  locus 
of  points  obtained  is  a  line,  limiting  to  the  origin  as  U3  is  large.  Thus,  (uo,ui,U2)  yield  the 
line,  and  U3  is  a  coordinate  along  the  line. 

The  equation  of  a  surface  5  of  degree  d  can  be  rewritten  as: 

5(uo5  Wi,  U2,  M3)  =  Ho{Uo,  Ml,  M2)m3  +  Hi{uo,  Ml,  M2)m3  ^  +  ...Hd{uQ,  Mi,  M2)  =  0 

where  JT,(mo,  Mi,  M2)  is  homogenous  of  degree  i  in  mq,  mi,  and  M2.  The  focal  point  is  (0, 0, 0, 1), 
so  that  the  first  polar  through  the  focal  point  is: 

=  d^o(Mo,Ml,M2)M3“^  +  (d  -  1).H’i(mo,  Ml,  M2)M3"^  +  ...^f(i-l(Mo,  Ml,  M2) 

dU3 

and  this  vanishes  on  the  contour  generator  too.  Now  the  outline  consists  of  those  points 
(mo,  Ml,  M2)  where  both  equations  vanish;  the  equation  of  the  outline  is  therefore  obtained 
by  eliminating  M3  between  S  and  The  singularities  of  the  outline  all  have  multiplicity 
two,  and  are  those  points  (mo,mi,M2)  where  S  and  have  two  common  roots  in  M3.  The 
equations  yielding  these  points  can  be  obtained  using  a  technique  from  Salmon  [38]. 
Consider  two  polynomials  in  M3, 

t=ti 

n«3)  =  E  ad—i'^s 


G{U3)  =  bd^l-i'^3 

t=0 

If  F  and  G  have  two  common  roots,  then  there  must  be  some 

t=d-3 

^(^3)  =  Ad-3^iU3 

t=0 

and  ,  ,  ^ 

i=0 

such  that  FM  +  GiV  =  0  identically.  M  is  the  product  of  all  the  factors  of  G  that  do  not 
appear  in  F^  and  N  is  the  product  of  all  the  factors  of  F  that  do  not  appear  in  G.  The 
polynomial  FM  +  GN  has  degree  2d  -  3,  and  there  are  2d  -  3  unknown  A,-,  and  2d  -  2 
monomials  in  We  can  construct  a  2d— 3  by  2d— 2  matrix  C,  such  that  FM  -{-GN  —  a  Cu, 
where  a  is  the  vector 

(Ao,  Ai, Ad_35  Arf-.2,  Ai/-i, A2d-5,  A2d-4y 

u  is  the  vector 
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and  C  is  the  2d  -  3  by  2d  -  2  matrix  whose  entries  are: 


CLO 

«! 

0-2 

.. 

.. 

ad  0 

0 

0 

0 

0 

fflo 

Oi 

fl2 

ad 

0 

0 

0 

0 

0 

{d-2 

<*0 

rows) 

di  a2 

•• 

.. 

ad 

0 

0 

0 

0 

0 

..  *. 

ao 

ai  a2 

.. 

ad 

bo 

,, 

..  .. 

CO 

1 

-0 

bd-2  bd-i 

0 

0 

0 

0 

bo 

{d-1 

bi 

rows) 

.. 

•• 

1 

CO 

bd-i 

0 

0 

0 

0 

0 

0  .. 

bo 

6i  .. 

.. 

bd-3  bd-2 

bd- 

Since,  for  an  appropriate  choice  of  FM  +  GN  =  0  identically  (i.e.  all  the  coefficients 
vanish),  there  is  some  choice  of  a  such  that  a*C  =  0.  Hence,  the  2d  —  3  hv  2d  —  3  minors 
of  C  must  vanish. 

In  our  case,  aj  =  Hj,  and  bj  =  (d  ~  j)Hj,  The  method  of  construction  of  the  matrix 
ensures  that  the  minors  are  homogenous;  the  degree  of  a  minor  in  (uq^  ^1,^2)  can  be  deter¬ 
mined  by  computing  the  degree  of  a  typical  monomial  in  the  minor.  Such  a  monomial  can 
be  obtained  by  striking  one  column  of  C,  and  multiplying  2d  —  3  elements  from  the  remain¬ 
ing  square  matrix  using  each  row  and  each  column  only  once.  The  resulting  monomial  will 
have  the  form  HaHbHc-.*,  and  its  degree  is  the  sum  of  the  subscripts.  This  process  shows 
the  degrees  of  the  minors  are: 

(d  -  l)(d  -  2),  (d  -  l)(d  2)  +  1,  (d  -  l)(d  -  2)  +  2, ..,  (d  -  2)(d  -  1)  +  2d  -  3 

At  a  singularity  of  the  outline,  these  minors  must  all  vanish,  so  that  there  exists  a  family  of 
curves,  which  intersect  at  most  in  points,  of  these  degrees,  which  pass  through  the  singular 
points.  In  particular,  the  singularities  must  lie  on  (though  not  necessarily  exhaust)  the 
intersection  of  a  curve  of  degree  (d  —  l)(d  —  2)  with  a  curve  of  degree  (d  —  l)(d  —  2)H-1, 
where  these  curves  do  not  have  a  common  component. 

This  means  that  the  singularities  are  strongly  constrained.  A  curve  of  degree  s  has 
{l/2){s  +  1)(5  +  2)  coefficients,  meaning  that  -h  l){s  +  2)  -  1  general  points  uniquely 

specify  such  a  curve.  There  are  in  total  (l/2)((d^  — 2d  —  l)  +  l)((d^  — 2d  —  l)-f  2)  singularities; 
if  these  were  in  general  position,  the  lowest  degree  curve  that  would  pass  through  all  of  them 
would  have  degree  {(P  -  2d  -h  1)  =  (d  -  1)^. 

The  matrix  C  yields  a  great  deal  of  information  about  the  structure  of  the  problem. 
Write 

C  =  (co,Ci,C2,...,C2d-3) 
where  the  cj  are  column  vectors.  Let 

Cl  =  (co,Ci,C2,...,C2d-4) 

By  inspecting  the  diagonal  elements,  it  can  be  seen  that  the  determinant  of  Ci,  which  is 
square,  has  degree  (d—  l)(d—  2).  Let  D  =  Adjoint{Ci)  (where  the  adjoint  is  the  transpose 
of  the  matrix  of  cofactors),  and  let 

Cr  =  (co,Ci,...,C2d-6^<^2d-5?C2d--3) 
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Inspecting  the  diagonal  elements  shows  that  IIS'S  degree  {d  —  l)(d  —  2)  +  1.  Write 

V  for  Det{C\)  and  Q  for  Det{Cr).  Now  both  V  and  Q  are  2d  -  3  by  2d  -  3  minors  of  C, 
and  so  must  vanish  on  all  the  singular  points  of  the  outline. 

Since  Cu  is  a  vector  of  polynomials,  all  of  which  vanish  at  every  point  on  the  contour 
generator,  DCu  must  also  consist  of  a  vector  of  polynomials,  each  of  which  vanishes  at 
every  point  on  the  contour  generator.  In  particular,  the  last  row  of  DC  has  the  form: 

(0,0,0,...,0,P,Q) 

and  so  the  last  element  of  DCu  is  the  equation 

Vu3  +  Q 

which  must  vanish  at  every  point  on  the  contour  generator.  This  equation  cannot  vanish 
trivially;  that  is,  at  “almost  every”  point  on  the  contour  generator,  both  V  and  Q  are 
non- zero,  by  the  following  argument: 

both  V  and  Q  are  homogenous  polynomials  in  the  variables  uq,  ui  and  U2  and 
so  they  vanish  on  a  cone  passing  through  the  point  (0, 0, 0, 1).  If  V  and  Q  were 
to  vanish  on  the  entire  contour  generator,  this  cone  would  contain  the  contour 
generator,  and  so  7^  and  Q.  would  have  to  vanish  on  the  projection  of  the  contour 
generator  through  the  point  (0,0, 0,1)  to  any  plane.  However,  a  projection  of 
the  contour  generator  to  a  plane  through  (0, 0, 0, 1)  must  (by  the  generic  choice 
of  surface)  be  irreducible  and  have  degree  d(d  —  1);  neither  V  nor  Q  can  vanish 
at  every  point  of  an  irreducible  curve  of  this  degree,  because  their  degrees  are 
too  low. 

This  means  that,  if  P  and  Q  can  be  determined  from  the  image,  the  contour  generator 
can  be  reconstructed  from  the  outline,  because  the  “missing”  homogenous  coordinate  of  the 
contour  generator,  U3  (which  can  loosely  be  thought  of  as  “depth”)  can  be  determined  as 

-Q 

This  expression,  though  not  strictly  a  function,  is  meaningful,  because  the  degree  of  Q  is  one 
larger  than  the  degree  of  P;  as  a  result,  if  (uq,  u\,  U2)  were  to  be  scaled  by  A,  the  expression 
for  U3  would  be  scaled  by  A  too. 

In  fact,  P  and  Q  can  be  determined  from  image  information  alone,  up  to  an  ambiguity 
which  is  a  subgroup  of  the  projective  group;  P  is  the  only  polynomial  of  degree  (d—  l)(d— 2) 
that  vanishes  on  aU  the  singularities  of  the  outline,  and  hence  can  be  determined  from  image 
information  up  to  scale.  In  turn,  Q  is  a  polynomial  of  degree  {d  —  l)(d  —  2)  -|- 1  that  vanishes 
on  all  the  singularities  of  the  outline.  There  is  a  four  dimensional  space  of  such  polynomials; 
section  3.1.3  shows  that  the  ambiguity  arising  from  choosing  one  of  these  polynomials  to 
act  as  Q  arbitrarily  is  just  a  projective  transformation  of  the  contour  generator. 

The  constraints  that  singularities  lie  on  curves  of  particular  degrees  determine  the  family 
of  curves  that  are  generic  outlines  of  smooth  surfaces,  according  to  an  result  of  [5]  which 
states  that: 

Theorem:  (D’Almeida)  Let  T  he  a  plane  curve  of  degree  n{n  -  1),  n>  3.  The 
necessary  and  sufficient  condition  that  there  exists  a  smooth  surface  S  C  and 
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a  generic  point  p  of  such  that  F  is  the  curve  of  ramification  of  the  projection 
of  S  through  p,  is  as  follows: 

The  curve  F  has  d  =  n{n  -  l)(ra  -  2){n  -  3)/2  ordinary  double  points,  k  = 
n{n  -  l)(n  -  2)  cusps  and  no  other  singularities.  There  are  two  curves  po 
and  Pi  of  degrees  —  3n  +  2  and  —  3n  +  3  respectively,  without  a  common 
component,  that  pass  through  the  singular  points  ofV.  The  minimal  degree  of  a 
plane  curve  containing  the  singular  points  of  F  is  —  3n  +  2 

Note  that  the  “curve  of  ramification”  is  equivalent  to  our  outline.  Any  errors  in  translation 
are  mine. 

2.3  Further  global  properties  of  the  outline 

The  study  of  outlines  is  quite  rich  in  curious  geometric  properties;  in  particular,  the  form  of 
a  generic  outline  is  strongly  constrained,  and  the  projective  invariants  of  a  generic  outline 
must  satisfy  constraints.  For  example,  note  that  there  must  exist  a  frame  in  which  the 
outline  of  a  cubic  surface  has  the  form  =  0,  where  Q  is  quadratic  and  C  is  cubic. 

This  can  be  shown  by  representing  the  surface  as 

S{uo,Ui,U‘2,U3)  =  Ho{uo,Ui,U2)u^  +  Hi{uo,Ui,U2)ul  +  H2{U(i,Ui,U2)Uz  +  Hz{uq,Ui,U2) 

By  choice  of  frame,  the  focal  point  can  be  given  coordinates  (0, 0,0,1)  and  we  can  ensure 
Hi{uo,ui,U2)  =  0  identically;  divide  by  Hq  (which  is  a  constant),  to  get  the  form 

S{Uo,  Ui,  U2,  Us)  =  U3  +  H2{Uo,  til,  U2)U3  +  Hs{Uo,  til,  ^2) 

The  polar  through  the  focal  point  is  now 

T(uq,  Ui,  U2,  U3)  =  3^3  +  H2{uo,  Ul,  112) 

The  resultant  with  respect  to  us  has  degree  six,  and  consists  of  terms  formed  from  Hs 
(degree  3)  and  H2  (degree  2),  and  so  must  have  the  form  -  0,  for  an  appropriate 

choice  of  C  and  Q. 

Similar  statements  are  possible  about  the  outlines  of  surfaces  of  higher  degree,  but  the 
form  of  the  constraint  becomes  more  complex;  a  possible  benefit  of  such  a  result  includes 
controlling  the  complexity  of  the  fitting  problem  -  most  algebraic  curves  are  not  outlines. 

3  Obtaining  the  contour  generator  from  the  outline 

Determinining  the  contour  generator  from  the  outline  requires  knowledge  of  the  “depth” 
to  the  contour  generator  at  each  point  of  the  outline.  The  last  sections  indicated  how 
this  depth  is  to  be  found,  by  showing  an  expression  that  gives  the  homogenous  coordinate 
U3  as  a  rational  function  of  the  other  three  homogenous  coordinates  on  the  outline.  In 
particular,  this  rational  function  can  be  determined  from  the  singularities  of  the  outline 
using  the  property  that  both  numerator  and  denominator  vanish  the  singularities  of  the 
outline.  This  means  that  the  expression  for  U3  is  undetermined  at  these  points.  Surprisingly, 
this  is  a  useful  property,  because  it  makes  it  possible  to  obtain  a  non-singular  space  curve 
from  a  singular  plane  curve.  In  particular,  the  process  sketched  above  for  determining  the 
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contour  generator  from  the  outline  is  widespread  in  algebraic  geometry,  and  is  known  as 
“blowing  up.”  This  section  provides  some  simpler  examples  of  blowing  up  to  demonstrate 
how  the  process  can  “undo”  singularities;  it  then  shows  that  the  reconstruction  of  the 
contour  generator  is  correct  by  showing  that  it  is  the  only  possible  such  reconstruction, 
up  to  a  projective  transformation  of  space.  This  latter  result  requires  some  complicated 
machinery,  which  is  briefly  introduced. 


3.1  Blowing  up 

The  outline  has  only  cusps  and  double  points  as  singularities,  by  the  assumption  that  both 
surface  and  viewing  position  are  generic.  This  means  that  there  is  no  need  to  blow  up 
more  complex  singularities.  In  the  case  of  blowing  up  cusps  or  double  points,  the  central 
issue  a  depth  function  that  can  be  evaluated  along  the  plane  span  of  the  curve,  giving  the 
coordinates  of  the  space  curve  in  affine  coordinates  as 


{x,y, 


or  in  homogenous  coordinates  as 

(xo,  Xi,  X2,  =  (G(xo,  a;i,  a:2)a:o,  Gixo,  Xu  X2)xi,G{xo,  Xi,  X2)x2,  F{xo,  xi,X2)) 

,  X2 } 

Clearly,  in  the  case  of  homogenous  coordinates  the  degree  of  G  is  one  less  than  the  degree 
of  F.  The  depth  function  must  have  two  values  at  each  double  point  (these  are  given  as 
limits  as  a  point  on  the  curve  approaches  the  double  point),  so  as  to  construct  two  points  at 
different  depths  in  space  that  correspond  to  the  single  point  in  the  image.  As  the  following 
two  examples  show,  this  is  achieved  by  having  the  depth  function  undefined  (0/0)  at  the 
singularities,  with  appropriate  limiting  properties  close  to  the  singularities;  in  the  case  of 
homogenous  coordinates,  all  four  homogenous  coordinates  vanish  simultaneously,  again  with 
appropriate  limiting  properties. 

3.1.1  Example:  blowing  up  a  double  point  in  the  affine  plane: 

Consider  the  curve  given  by  =  0,  which  has  a  double  point  at  the  origin  where 

the  curve  crosses  itself  transversaUy.  The  curve  can  be  parametrised  as 

(x,y)  =  -  1) 

where  t  is  some  complex  parameter.  The  curve  passes  through  the  double  point  when  t  =  1 
OT  t  =  -1. 

The  function 

f{x,y)  =  {x,y,x/y) 

which  takes  a  point  in  the  plane  to  a  point  in  space,  is  undefined  at  the  origin;  furthermore, 

lim  f(tcos9,tsm  0) 

<-►0  ^ 

depends  on  9,  so  that  when  the  function  is  applied  to  a  curve  approaching  the  origin,  the 
value  of  the  2:-coordinate  depends  on  the  direction  of  the  approach.  In  particular,  applying 
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this  function  to  the  curve  under  consideration  produces 

(x,y,z)  =  -  1,«) 

less  the  points  t  =  1  and  t  =  —1,  where  the  function  is  not  defined.  However,  at  these  points 
the  space  curve  has  meaningful  limits,  which  are  (0,0, 1)  and  (0,0,  —1).  By  attaching  these 
limit  points  we  obtain  a  smooth  space  curve  from  a  singular  plane  curve. 

3.1.2  Example:  blowing  up  a  cusp  in  the  projective  plane: 

In  the  projective  plane,  points  are  given  by  three  homogenous  coordinates.  In  this  case,  a 
polynomial  cannot  be  a  function,  because  scaling  each  homogenous  coordinate  changes  the 
value  of  the  polynomial  without  changing  the  point  at  which  the  function  is  defined.  Thus, 
functions  are  given  by  ratios  of  homogenous  polynomials  of  the  same  degree  in  homogenous 
coordinates.  In  fact,  a  function  that  maps  a  curve  in  the  projective  plane  to  a  curve  in 
projective  space  can  be  given  as  four  homogenous  polynomials  of  the  same  degree  in  the 
homogenous  coordinates  of  the  plane;  in  this  form,  each  polynomial  represents  a  homogenous 
coordinate  in  space. 

Consider  the  curve  given  by  Xq  -  X2xl  =  0  in  the  projective  plane;  this  curve  has  a 
cusp  at  (0,0,1),  and  can  be  parametrised  as  (r^3,r^,s^),  where  (r,s)  are  the  homogenous 
coordinates  of  a  point  on  the  projective  line.  Consider  the  following  function  from  the 
projective  plane  to  projective  three-space: 

fi,XQ^X\jX2')  “  (^Xqj  X\XQj  X\X2j  XQX2') 

Applied  to  the  curve,  this  function  yields  the  parametric  space  curve  given  in  homogenous 
coordinates  by: 

(rV,r®s,rV,rV) 

which  is  equivalent  to  that  given  in  homogenous  coordinates  by: 

{r^s,r^,rs^,s^) 

This  curve  is  a  twisted  cubic  -  this  is  perhaps  easiest  to  see  by  dividing  by  the  fourth 
coordinate,  writing  r/s  =  t  and  ignoring  the  point  at  infinity,  giving  the  curve  in  affine 
(non- homogenous)  coordinates  as  (t^,t®,t);  this  space  curve  has  no  singularities. 

3.1.3  Blowing  up  the  outline 

The  key  to  blowing  up  a  curve  with  double  points  and  cusps,  as  the  examples  have  shown, 
is  to  obtain  a  depth  function  that  goes  to  §  at  the  double  points  and  cusps  of  the  curve.  For 
the  outline  of  an  algebraic  curve  in  affine  coordinates,  such  a  function  is  easily  available. 
Recall  from  section  2.2  that  there  exists  two  equations  V  of  degree  {d  -  l)(d-  2),  and  Q  of 
degree  {d  —  l)(ci  —  2)  -|-  1  (with  no  factor  in  common  with  V),  both  of  which  vanish  on  the 
singularities  of  the  outline.  These  equations  yield  the  necessary  depth  function. 

Because  the  reconstruction  is  proceeding  up  to  a  projective  ambiguity,  it  is  possible  to 
choose  a  focal  point;  choose  this  focal  point  to  be  (0,0,0, 1),  to  simplify  the  working.  Now 
the  contour  generator  is  some  curve  {uo{t),ui{t),U2{t),U3{t)),  and  the  outline  in  the  image 
plane  consists  of  the  curve  (xo(0.«i(0>®2(0)  =  (wo(0.  wi(^)>“2(0)-  Reconstructing  the 
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Figure  2:  Four  frames  from  a  fly-by,  showing  a  plane  curve  with  a  double  point  and  its 
blow-up.  The  blown-up  curve  is  a  non-singular  space  curve,  shown  here  lying  above  the 
plane  curve.  It  projects  to  the  plane  curve  under  orthographic  projection  in  this  case. 
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contour  generator  consists,  in  effect,  of  supplying  the  missing  U3(<).  However,  from  the 
previous  section,  Vuz  +  Q  =  0  on  the  contour  generator,  and  V  and  Q  are  expressions  in 
uq,  ui,  and  U2  alone,  which  can  be  determined  from  the  image  information,  so  that  U3  can 
be  determined  at  each  point  on  the  curve. 

In  particular,  giyen  an  outline  in  the  projective  plane,  apply  the  map 

(a;o,  xi,  X2)  -*■  {xoV,  xiV,  X2V,  Q) 

taking  every  point  on  the  outline  to  a  point  in  space.  For  convenience,  call  this  map  the 
“lifting  map”.  At  the  singular  points  of  the  outline,  the  lifting  map  degenerates  (as  both 
V  and  Q  vanish  at  these  points,  the  image  of  these  points  in  the  map  given  is  (0, 0, 0, 0), 
which  is  not  a  meaningful  point  in  projective  space).  The  result  of  the  following  section 
shows  that  the  closure  (required  to  fill  in  the  missing  points  where  the  map  degenerates)  of 
the  image  of  the  outline  in  the  lifting  map  must  be  the  contour  generator. 

The  lifting  map  has  further  useful  properties;  in  particular,  it  has  the  property  that 

iroLift  =  Identity 

where  tt  is  projection  through  the  point  (0, 0, 0, 1).  In  coordinates,  drop  the  fourth  homoge¬ 
nous  coordinate,  so  that 

■KoLift  :  {xq,X\,X2)  {XQV,XiP,X2P)  =  {xo,xi,X2) 

This  means  that,  if  Lift  takes  the  outline  to  the  contour  generator,  it  does  so  with  a  notion 
of  the  appropriate  focal  point  through  which  to  project  the  contour  generator  back  on  to  the 
outline  -  the  particular  lift  constructed  presumes  that  the  focal  point  is  (0, 0, 0, 1),  which  can 
be  done  without  loss  of  generality  by  choice  of  coordinates.  Any  other  particular  focal  point 
can  be  chosen  as  weU,  though  the  form  of  the  resulting  lift  is  slightly  more  complicated; 
the  important  thing  is  that,  once  the  lifting  process  has  been  applied,  both  the  contour 
generator  and  the  focal  point  are  available,  in  a  single  coordinate  system.  This  data  is 
sufficient  to  determine  the  surface. 

The  lifting  map  contains  an  intrinsic  projective  ambiguity,  because  Q  cannot  be  de¬ 
termined  uniquely.  There  are  sufficient  singularities  for  'P  to  be  known  up  to  a  scale  - 
which  is  not  a  source  of  ambiguity,  because  we  are  working  in  homogenous  coordinates  - 
but  there  is  a  four-dimensional  space  of  curves  of  degree  (d  —  l)(d  —  2)  -b  1  that  vanish 
on  the  singularities,  spanned  by  {Q,xqP,xiV,X2P).  An  element  of  this  space  is  given  by 
Qa  =  aoxoV  -b  aiXiV  -b  02*2^  +  oaS  Now  if  the  lifting  map  uses  Qa  instead  of  Q,  the 
resulting  curve  is: 

{xqV, xiV, X2V,  Qa)  =  (xqP, x\P, X2V,  Q)M 
where  M  is  the  matrix: 

1  0  0  uo 

0  1  0  oi 

0  0  1  02 
0  0  0  03 

This  is  clearly  just  a  projective  transformation,  as  long  as  03  0.  Since  both  V  and  the 

whole  space  of  possible  Qa ’s  can  be  determined  from  the  outline,  satisfying  the  requirement 
that  03  ^  0  simply  involves  choosing  a  Qa  that  does  not  share  a  factor  with  P ,  which  is 
easily  done. 
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3.2  Uniqueness  of  the  lift 

The  sections  above  have  shown  constructively  that  the  outline  can  be  lifted  to  yield  the 
contour  generator,  and  have  demonstrated  a  lifting  process  that  must  yield  the  contour 
generator  from  the  outline.  It  is  also  possible  to  show  that  this  is  the  only  process  that  will 
do  so;  the  proof  is  not  novel,  and  requires  a  certain  amount  of  technical  algebraic  geometry; 
it  is  included  here  for  completeness.  Space  does  not  allow  a  comprehensive  introduction  to 
the  material  required,  but  subsection  3.2.1  introduces  the  general  approach,  and  sketches 
the  direction  that  the  mathematics  in  subsection  3.2.2  takes,  as  the  form  of  argument  used 
represents  a  powerful  tool  for  solving  questions  about  space  curves.  The  reader  is  referred  to 
[14],  which  is  difficult  but  comprehensive,  or  to  [12],  which  is  much  more  approachable  but 
less  wide-ranging.  The  reader  willing  to  accept  that  the  lifting  process  in  the  previous  section 
yields  the  contour  generator  may  wish  to  skip  both  sections,  or  read  only  subsection  3.2.1. 

% 

3.2.1  Thrust  of  the  mathematics 

The  central  question  is:  given  a  projection  of  an  abstract  algebraic  curve  satisfying  partic¬ 
ular  constraints,  in  how  many  projectively  different  ways  could  that  curve  be  embedded  in 
space,  consistent  with  the  image  data?  The  result  that  will  appear  is  that  there  is  a  natural 
choice  of  depth  function  to  obtain  the  contour  generator  from  the  outline.  This  residt  is  a 
statement  about  the  possible  embeddings  of  a  curve  in  space  that  are  consistent  with  the 
image  data. 

Embeddings  of  curves  are  generally  attacked  through  a  technical  device  called  a  line 
bundle,  which  consists  of  a  collection  of  sets  made  up  of  the  cartesian  product  of  an  open 
set  on  the  curve  and  an  affine  line.  These  sets  are  pasted  together  in  a  precise  way  using 
transition  functions.  Transition  functions  are  associated  with  the  intersection  of  two  of 
these  sets;  their  domain  is  the  open  set  on  the  curve,  and  their  range  is  the  line.  Transition 
functions  allow  studies  of  sections  of  line  bundles,  which  associate  points  on  the  line  with 
points  on  the  curve.  Formally,  a  section  is  a  map  from  the  curve  to  the  line  bundle,  so 
that  the  projection  of  the  map  onto  the  first  factor  is  the  identity;  this  means  that,  in  some 
coordinate  system,  the  map  has  the  form  f  :p-^  (p,q),  where  p  is  a  point  on  the  curve  and 
q  is  a  point  on  the  line.  Where  two  sets  intersect,  there  are  two  ways  of  writing  each  point 
on  the  curve  and  each  point  on  the  line  -  one  set  of  coordinates  for  each  set.  Transition 
functions  define  the  correspondence  between  points  on  the  line  in  the  coordinates  associated 
with  the  first  set,  and  those  in  the  coordinates  cissociated  with  the  second  set.  In  fact,  the 
choice  of  transition  functions  yields  the  bundle. 

The  result  is  an  object  that  locally  looks  like  a  piece  of  curve  crossed  with  the  affine  line 
(c.f.  the  vector  bundles  of  differential  geometry).  Line  bundles  in  algebraic  geometry  have 
more  rigidity  properties  than  the  bundles  of  differential  geometry,  for  two  reasons.  Firstly, 
the  topology  used  to  define  open  sets  is  the  Zariski  topology,  where  all  algebraic  sets  are 
closed;  this  means  that  an  open  set  on  a  curve  consists  of  the  whole  curve,  less  some  finite 
number  of  points.  Secondly,  the  bundles  under  consideration  are  typically  holomorphic  - 
this  means  that  the  transition  functions  are  analytic  in  their  domain. 

The  following  example  (which  is  a  modified  version  of  example  4.7  in  [12])  displays  a 
family  of  line  bundles  over  the  projective  line.  The  first  open  set  on  the  projective  line  will 
consist  of  the  points  given  in  coordinates  cis  s,  for  s  some  complex  number  (henceforth, 
the  complex  numbers  will  be  denoted  by  C);  call  this  set  U.  This  is  the  projective  line 
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less  one  point,  the  point  at  infinity.  The  second  open  set  will  consist  of  the  points  given 
in  coordinates  as  t,  for  t  some  complex  parameter;  call  this  set  V.  Again,  this  is  the 
projective  line  less  a  point  (which  would  be  the  origin  in  s  coordinates).  Define  the  change 
of  coordinates  in  crossing  from  U  to  V  hy  t  =  If s.  The  two  sets,  pasted  together  in  this 
way,  give  the  whole  of  the  projective  line;  figure  3  illustrates  how  the  sets  are  assembled 
together  to  yield  a  line. 
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to  the  projective  line 

Figure  3:  Pasting  together  two  affine  lines  to  get  a  projective  line.  The  first  copy  of  the  affine 
line  parametrizes  all  the  points  on  the  projective  line  save  the  point  at  infinity;  the  second 
copy  parametrizes  the  point  at  infinity,  but  lacks  the  origin.  These  two  copies  intersect 
almost  everywhere;  they  are  pasted  together  by  specifying  how  the  coordinate  of  a  given 
point  on  one  set  relates  to  the  coordinate  of  the  equivalent  point  on  the  other. 
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The  line  bundle  will  consist  of  the  sets  U  xC  and  FxC,  pasted  together  in  an  appropriate 
way.  The  set  U  C\V  consists  of  the  whole  line,  less  two  points.  There  must  be  two  transition 
functions;  fvu,  which  takes  coordinates  on  the  line  in  f/’s  frame  to  those  in  F’s  frame,  and 
fuv^  which  takes  coordinates  on  the  line  in  V^’s  frame  to  those  in  tf’s  frame.  Consider  a 
point  p  in  U  f]V.  Write: 

•  pu  for  the  coordinates  of  p  in  f/’s  frame; 

•  qi/  for  the  coordinate  in  U's  frame  of  a  point  on  the  line  C; 

•  pv  for  the  coordinates  of  p  in  V’s  frame; 

•  qv  for  the  coordinate  in  F’s  frame  of  the  point  on  the  line  C  that  would  be  written 
qu  in  ff’s  frame. 

Then  the  pair  {pu,qu)  corresponds  to  the  pair  {pv,qv)  =  {pv ,  fvu{pu)<^’)-  Clearly,  we 
have  that  fuvfvu  =  1,  and  that  neither  function  vanishes  on  U  f]V .  We  can  now  define 
a  family  of  line  bundles  by  the  transition  functions  fvu  =  and  fuv  =  s"  =  t”” . 

These  functions  specify  how  the  coordinates  of  a  holomorphic  section  change  as  we  move 
from  U  to  V  and  back. 

In  U,  a  holomorphic  section  of  this  bundle  must  have  the  form  (s,  (t(s)),  where  tr  is  a 
holomorphic  function  on  C.  As  a  result,  <t  has  a  representation  of  the  form 

t=oo 

1=0 

on  U.  In  17  Pi  V,  this  section  must  also  have  the  representation  (in  the  coordinate  t  on  V^) 

t=oo  t=oo 

r  5]  a.r‘'  =  E 

t=o  «=o 

and  this  expression  must  also  be  holomorphic.  In  turn,  this  means  that  for  n  <  0,  there  are 
no  holomorphic  sections.  For  n  >  0,  there  are  holomorphic  sections  which  have  the  form 
(5,  a,s’)  (for  any  choice  of  a,)  in  U  and  in  V,  the  form  (t,  *)•  Hence,  for  a 

given  71,  there  is  an  71  +  1  dimensional  vector  space  of  holomorphic  sections,  corresponding 
to  the  choice  of  Oj. 

Taken  on  its  own,  a  section  of  a  line  bundle  has  no  interest  for  us  here;  however,  a  ratio 
of  holomorphic  sections  of  a  line  bundle  is  a  meromorphic  (rational,  with  poles)  function  on 
the  curve.  Thus,  in  the  example  above,  with  n  =  Z,  there  is  a  four  dimensional  vector  space 
of  sections.  The  four  sections  given  in  U  coordinates  by  (s,  1),  (i»,5),  {s,s^)  can 

be  thought  of  as  a  map,  applied  to  the  curve,  taking  it  to  the  points  given  in  homogenous 
coordinates  in  space  as: 

(I,s,s2,s3) 

These  four  distinct  holomorphic  sections  of  this  line  bundle  map  the  line  to  the  twisted  cubic 
in  projective  space,  less  one  point;  this  missing  point  is  the  image  of  the  point  at  infinity, 
and  can  be  obtained  by  evaluating  these  sections  at  the  one  point  in  V  that  does  not  lie  in 
Ur\V.  In  general,  four  distinct  holomorphic  sections  of  a  line  bundle  on  a  curve  represent 
a  map  taking  the  curve  to  a  curve  in  P^,  and  the  resulting  space  curve  is  algebraic.  Note 
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that  sections  can  be  added  to  one  another  or  multiplied  by  constants,  so  that  one  usually 
considers  the  linear  span  of  a  set  of  sections.  The  attractive  features  of  line  bundles  as  tools 
are  illustrated  by  our  example; 

•  There  are  “few”  holomorphic  sections; 

•  it  is  very  often  possible  to  teU  “how  many”  holomorphic  sections  there  are; 

•  a  bundle  that  has  n  independent  holomorphic  sections  represents  a  map  taking  the 
curve  to  a  curve  in 

As  a  result  of  these  properties,  line  bundles  are  a  central  tool  in  studying  embeddings  of 
algebraic  curves. 

It  can  be  seen  from  the  example  that  different  line  bundles  represent  embeddings  with 
different  properties;  we  shall  be  concerned  with  a  line  bundle  often  represented  as  Oc{l), 
where  C  is  the  contour  generator.  In  the  case  of  the  projective  line  given  above,  this  would 
be  the  bundle  that  would  result  for  ra  =  1.  For  a  general  plane  curve  C,  a  general  section 
of  Oci^)  would  vanish  either  on  a  set  of  points  where  a  line  intersects  the  curve,  or  on  a 
set  of  points  that  are  functionally  equivalent  to  a  linear  section.  In  this  case,  functional 
equivalence  means  that  the  points  are  given  by  the  vanishing  of  (pi7r)/p2,  where  pi  and  p2  are 
homogenous  polynomials  of  the  same  degree,  tt  is  the  equation  of  a  line,  and  the  expression 
(pi7r)/p2  has  no  poles,  so  that  the  zeros  of  p2  must  all  lie  on  zeros  of  piir.  Clearly,  if  pi  is  the 
equation  of  some  arbitrary  line  and  p2  =  this  condition  is  satisfied.  For  some  curves,  there 
are  other  cases  that  will  satisfy  this  condition.  For  example,  if  C  is  the  outline  of  a  surface, 
then  pi  =  Q,P2  =  will  also  satisfy  this  condition,  where  V,  Q  are  the  equations  vanishing 
on  the  singularities  and  defined  above.  This  follows  because  the  expression  Vu^  +  Q,  which 
was  shown  above  to  vanish  on  the  contour  generator,  demonstrates  that  aU  the  zeros  of  V 
that  lie  on  the  contour  generator  conincide  with  zeros  of  Q. 

For  a  space  curve,  a  choice  of  four  linearly  independent  sections  of  the  bundle  0(^(1) 
gives  an  embedding  of  the  curve  in  space;  in  particiilar,  this  bundle  admits  four  sections 
that  can  be  represented  in  coordinates  as  (uo,ui,U2>  ^3)  (which  basically  just  embeds  the 
curve  where  it  is  in  space).  Four  linearly  independent  sections  chosen  from  the  linear  span 
of  this  family  would  yield  an  embedding  of  the  curve  that  is  projectively  equivalent  to  the 
original  curve.  More  interestingly,  three  linearly  independent  sections  chosen  from  the  linear 
span  of  this  family  would  represent  a  projection  of  this  curve  onto  a  plane  through  some 
focal  point;  to  recover  the  space  curve,  one  would  need  to  determine  a  fourth  section  in  the 
family  generated  as  the  span  of  (uq,  ui,  U2,  uz)  (which  would  generate  our  “depth  function”). 
Of  course,  if  C>c(l)  admits  more  than  this  four  dimensional  vector  space  of  sections,  the 
problem  is  hopeless,  as  it  would  not  be  possible  to  determine  whether  the  fourth  section 
chosen  actually  lies  in  the  span  of  (uq,  Ui,  U2,  us),  and  so  one  could  not  know  without  other 
sources  of  information  whether  the  embedding  chosen  corresponded  to  the  correct  one.  The 
crucial  fact  is  that  Oc{l)  has  only  a  four  dimensional  family  of  sections  for  C  a  contour 
generator  (in  fact,  for  C.  a  complete  intersection).  This  means  that  the  fourth  section  can 
be  determined  from  a  projection  of  the  curve  up  to  at  worst  a  projective  ambiguity,  so  that 
the  contour  generator  can  be  recovered  from  the  outline. 
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3.2.2  Mathematical  details 

At  issue  is  Oc(l)>  for  C  the  contour  generator;  if  H^{C,  (i9cr(l)),  which  is  the  space  of  sec¬ 
tions  of  Oc0-)i  is  isomorphic  to  ^r°(P^,C?p3(l)),  then  P°(C, OcCl))  dimension  four. 
Since  the  outline  is  birational  to  the  contour  generator,  Oo{l)  is  the  same  as  Oc{X}i  where 
0  represents  the  outline;  three  linearly  independent  sections  of  Oc)(l)  are  known  (in  coor¬ 
dinates,  (aro,  a:i,  0:2)).  If  a  fourth  can  then  be  determined,  then  O  can  be  embedded  in  space 
using  these  four  sections,  and  the  result  must  be  projectively  equivalent  to  C . 

Lemma  3:  Given  an  algebraic  curve  C,  which  is  a  complete  intersection  in  and  is 
not  a  plane  curve,  H^{C,Oc{l))  is  isomorphic  to 

Proof:  I  am  indebted  to  Prof.  0.  DeBarre,  of  the  Mathematics  Department, 
University  of  Iowa,  for  pointing  out  the  following  lemma,  and  showing  me  how 
it  could  be  proven.  This  proof  largely  follows  his;  errors  or  inaccuracies  are  of 
my  own  addition.  Note  that  a  similar  fact  appears  as  an  exercise  in  [14]  (p.  188, 
ex.  8.4). 

Consider  the  following  exact  sequence  of  sheaves  associated  with  the  curve; 

O^I^Opz^Oc^O 

where  the  symbols  have  their  usual  meaning  (see,  for  example,  [14]).  Taking 
the  associated  cohomology  sequence,  and  twisting  by  1,  we  obtain  the  following 
long  exact  sequence: 

0  H\P\lil))  H%P^,Op3il))  -  H\C,Oc{l))  H\P\I{1)) 

^0(p3,2’(i))  represents  those  homogenous  linear  expressions  that  vanish  on 
the  curve,  and  must  be  empty  because  the  curve  does  not  lie  in  any  plane. 
H°{P^,Op3{l))  represents  the  hyperplanes  in  P^  and  H°{C,Oc{l))  represents 
the  space  of  sections  of  the  line  bundle  given  by  a  hyperplane  section  of  C.  If  we 
can  prove  that  H^{P^,X{1))  is  empty,  we  have  that  H°{C,  Oc{l))  is  isomorphic 
to  the  system  of  hyperplanes  in  P^,  and  so  that  the  sections  of  this  bundle  form 
a  four-dimensional  space. 

The  curve  is  a  complete  intersection,  given  (say)  by  p  =  0,  q  =  0,  for  polynomials 
p  and  q.  As  a  result,  we  have  the  following  free  resolution  of  its  ideal: 

0-^  R-*  RQR-*  I  -^0 

where  R  is  the  graded  ring  of  homogenous  polynomials  in  four  variables  over 
the  complex  numbers,  and  /  is  the  curve’s  ideal.  In  this  sequence,  the  injection 
R-^  RQ  R  is  given  by  /  :-»•  {-pf,qf),  and  the  surjection  R®  R^  I  is  given 
by  (a,  b)  :—>■  qa  -1-  pb.  Keeping  track  of  the  grading,  we  find: 

0  i?(l  -  m  -  n)  R{1  —  m)  ©  il(l  -  n)  7(1)  — >•  0 

This  free  resolution  yields  the  exact  sequence  of  line  bundles: 

0  Op3(l  -  m  —  n)  —>■  Op3(l  -  m)  ©  Op3(l  —  n)  —>■  1(1)  0 
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Taking  the  associated  cohomology  sequence,  and  recalling  the  standard  result 
that 

H\P^,Opr,{j))  =  0 

for  0  <  i  <  n  and  for  all  j  G  Z  ([14],  p.  225),  gives  that  is  empty, 

and  so  we  have: 

0-^  H%P^,Op3{l))^H%C,Oc{l))-^0 

that  is,  the  two  are  isomorphic. 

Lemma  4:  The  expression  ^ 

V 

where  Q,  V  are  the  polynomials,  given  in  section  2.2  that  vanish  on  all  the  singular  points  of 
the  outline  0,  represents  in  coordinates  an  element  of  and  hence  an  element 

of  H°{C,Oc{l-)),  for  C  the  contour  generator. 

Proof:  We  have  shown  above  that 

Vus  +  Q  =  0 

on  the  contour  generator;  this  is  sufficient. 

3.2.3  Summary 

Given  the  outline  O  of  a  surface,  the  space  curve  C  given  by  applying  the  map 

(xo,  Xi,  X2)  {xqV,  XiV,  X2V,  Q) 

where  V  and  Q  are  polynomials  that  can  be  determined  by  an  overconstrained  fitting  process 
from  the  singularities  of  the  outline,  is  the  contour  generator  of  the  surface  when  it  is  viewed 
from  the  point  (0,0, 0,1).  Applying  this  map  to  a  large  number  of  points  on  the  outline 
yields  a  set  of  points  lying  on  the  contour  generator.  As  a  result,  the  equations  that  vanish 
on  the  contour  generator  can  be  determined  using  a  fitting  process.  Amongst  this  collection 
of  equations  lies  the  equation  of  a  surface,  projectively  equivalent  to  the  original  surface. 

4  Obtaining  the  surface  from  the  contour  generator 

The  previous  sections  showed  that  it  is  possible  to  take  the  image  outline  of  an  algebraic 
surface  and  obtain  a  point  in  space  and  a  space  curve,  which  are  respectively  the  focal  point 
and  the  contour  generator  that  gave  rise  to  the  outline,  and  are  in  the  same  coordinate 
frame  -  that  is,  the  outline  is  obtained  by  projecting  the  reconstructed  contour  generator 
through  the  reconstructed  focal  point.  The  contour  generator  and  focal  point  resulting  from 
this  reconstruction  are  projectively  equivalent  to  the  original  contour  generator  and  focal 
point. 

Once  the  contour  generator  through  a  particular  focal  point  is  known,  it  is  a  relatively 
simple  matter  to  obtain  the  surface,  because  of  the  strong  relationship  between  the  poly¬ 
nomials  that  vanish  on  the  contour  generator.  There  is  one  equation  of  degree  d  -  1  that 
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vanishes  on  the  contour  generator;  if  there  were  more,  its  degree  would  be  (d  -  1)^  or 
less.  This  equation  is  the  first  polar  of  the  surface  through  (0, 0,0, 1);  the  coefficients  of 
this  equation  can  be  determined  from  a  set  of  points  on  the  contour  generator  by  a  fitting 
process.  Call  this  equation  Tm- 

There  is  a  five- dimensional  linear  space  of  equations  of  degree  d  that  vanish  on  the 
contour  generator,  and  this  space  can  be  determined  by  a  fitting  process.  The  equations  lie 
in  the  linear  space  spanned  by  {uqT,  u\T,  u^T,  u^T,  S).  The  fitting  process  will  yield  a  basis 
for  this  space;  call  the  elements  of  this  basis  (Bq,  Bi,  B2,  B4).  Since 


T 

^  duz 

and  S  lies  in  the  span  of  this  basis,  it  follows  that 


for  some  set  of  constants  Vi  and  that 


S  =  '^yiBi 

(1) 

i=0 

T  -Yu~ 

^~h'9uz 

(2) 

Clearly,  this  equation  is  true  in  coefficients.  The  coefficients  of  Tm  are  known,  as  are  the 
coefficients  of  5^,  and  hence  those  of  their  partial  derivatives.  Since  Tm  must  have  at  least  10 
coefficients  for  the  problem  to  be  interesting  (5  must  have  degree  3  or  greater  for  the  result 
to  be  non-trivial),  the  terms  i/i  can  be  determined  from  equation  2.  Once  z/,*  are  known, 
S  can  be  reconstructed  from  equation  1.  There  must  be  at  least  one  solution,  because  the 
curve  is  known  to  be  a  contour  generator.  In  the  general  case,  this  is  the  only  solution. 

Lemma  5:  For  S  a  generic  surface  viewed  from  a  given  focal  point  /,  there  is  no  other 
surface  5',  such  that  the  contour  generator  of  S'  viewed  from  f  is  the  same  curve  as  the 
contour  generator  of  S  viewed  from  /. 


Proof:  The  process  that  forms  the  contour  generator  is  covariant.  It  is  therefore 
sufficient  to  demonstrate  that  this  lemma  holds  for  a  particular  focal  point.  This 
focal  point  can  be  chosen  to  be  (0, 0, 0, 1). 

If  there  are  two  different  surfaces,  whose  equations  are  S  and  5',  which  have 
the  same  contour  generator  when  viewed  through  this  focal  point,  the  tangency 
relation  that  defines  this  contour  generator  must  be  the  same  for  both  surfaces, 
as  the  contour  generator  has  degree  d(d  —  1),  and  so  only  one  form  of  degree 
d  —  1  can  vanish  on  it.  This  can  be  written  as: 


dS'  _  dS 

duz  ~ 


where  Aq  is  an  unknown  constant  to  allow  for  scaling  the  equations  (which  does 
not  affect  the  geometry  of  the  underlying  curve). 

Furthermore,  we  have  that  the  linear  system  of  five  degree  d  forms  that  vanishes 
on  the  contour  generator  is  the  same  for  each  surface.  Thus,  in  particular,  there 
are  constants  A^  such  that 


.  ^  dS  ^  ^  ds  ds  Q 

S  —  XiUq- - |-A2til^ - hA3ii2'^ - X4US- - h  As^ 

du3  du3  aus  OU3 
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As  a  result,  we  can  write: 


dS'  ,  d^s  ,  ,  d^s  ,  ,  d^s  ,  ,  d^s  , ,,  ,  ,,ds 

This  can  be  rewritten  as: 

,  ds  ,  d^s  ,,  d^s  , ,  525  . ,  d^s  ,,,  ,,,ds 

By  rearranging  terms,  and  setting  fM)  =  ^i,  fJ-i  =  -^2,  fi2  =  -^3?  /^3  =  -^4,  M4  = 
As  +  A4  —  Ao,  we  obtain: 

d^S  d^S  d^S  d^S  ,  dS  . 

For  the  case  that  S'  =  Ao5,  we  must  have  that  Ai  =  0,  A2  =  0,  A3  =  0,  A4  =  0, 
and  As  =  Ao,  so  that  all  the  m  must  vanish,  and  the  equation  is  trivially  true. 
If  we  have  S',  S  where  S'  Ao5,  then  there  must  be  some  solution  for  the 
above  equation  where  not  all  /x,  vanish.  This  yields  an  overdetermined  system 
of  equations  in  the  coefficients  of  5,  where  the  fi,  are  unknown.  For  these 
equations  to  be  satisfied,  the  determinants  of  the  coefficient  matrices,  which  are 
easily  shown  to  be  non-trivial  expressions  in  the  coefficients  of  5  alone,  must 
vanish.  In  turn,  these  determinants  represent  constraints  that  the  coefficients  of 
S  must  satisfy,  and  so  S  is  not  a  general  surface. 


5  Geometric  ambiguities 

The  discussion  above  assumed  abstract  projection.  Because  the  focal  point  for  the  recon¬ 
struction  and  the  sections  of  the  line  bundle  used  to  lift  the  outline  were  chosen  arbitrarily, 
it  is  not  surprising  that  the  best  possible  reconstruction  is  up  to  a  projective  transforma¬ 
tion.  However,  this  leaves  a  substantial  ambiguity  in  the  surface’s  geometry.  It  is  often  the 
case  that  the  internal  parameters  of  a  camera  are  fully  or  partially  known,  and  one  might 
hope  that  a  better  reconstruction  is  possible  in  this  case.  Surprisingly,  unless  a  modelbase 
is  available,  a  better  reconstruction  appears  impossible. 

Consider  a  calibrated  camera,  where,  without  loss  of  generality,  the  focal  point  lies  at 
(0,0,0, 1).  The  outline  of  an  object  is  formed  by  a  cone  of  rays  through  this  point,  and 
tangent  to  the  object  itself.  The  intrinsic  ambiguity  of  the  reconstruction  process  must 
include  all  transformations  of  the  object  that  fix  this  cone  of  rays  and  the  focal  point  -  this 
is  the  group  of  dilations  of  space,  written  as: 

10  0  0 
0  10  0 
0  0  10 
abed 

where  d  /  0.  If  there  is  no  modelbase,  then  the  geometry  of  the  surface  observed  must  be 
given  as  a  set  of  invariants  to  some  transformation  group.  In  particular,  in  most  conceivable 
applications  the  description  must  be  invariant  to  Euclidean  transformations  of  space.  It  is 
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easily  verified®  that  the  smallest  subgroup  of  the  projective  group  that  contains  both  the 
Euclidean  group  and  the  dilations  is  the  projective  group  itself.  This  means  that  to  describe 
algebraic  surfaces  by  invariants  using  only  outline  information  and  without  reference  to  a 
modelbase,  one  must  use  projective  invariants,  whether  the  camera  is  calibrated  or  not. 

However,  a  modelbase  changes  the  ambiguity  substantially.  If,  for  example,  there  is 
a  discrete  modelbase  with  a  small  number  of  models,  it  is  straightforward  to  extend  the 
consistency  approach  of  [9]  to  yield  a  Euclidean  reconstruction  of  a  system  of  surfaces, 
though  the  study  of  ambiguities  in  the  reconstruction  appears  to  become  difficult.  It  is 
not  known  whether  these  ambiguities  allow  reconstructions  when  there  are  parametrised 
families  of  models. 

6  Discussion 

There  is  now  a  constructive  path  from  observations  of  outline  points  to  the  full  projective 
geometry  of  the  surface,  which  goes  as  follows: 

•  Fit  an  algebraic  curve  that  is  an  outline  to  the  observations  -  this  process  will  also 
yield  the  degree  of  the  surface  (by  a  search  over  degrees  d{d  -  1)  for  increasing  d,  if 
necessary). 

•  Determine  the  singularities  of  this  fitted  curve  (using  the  coefficients  of  the  curve), 

•  Compute  the  coefficients  of  the  polynomials  that  would  blow  up  these  singularities, 
as  described  above  in  section  3.1.3. 

•  Use  these  coefficients  to  form  a  map  from  the  plane  to  space  (section  3.1.3).  Apply 
this  map  to  a  large  number  of  points  on  the  outline,  yielding  a  collection  of  points  in 
space. 

•  Determine  the  unique  surface  of  degree  d  —  1  passing  through  these  points  in  space. 

•  Determine  the  five-parameter  family  of  surfaces  of  degree  d  passing  through  these 
points  in  space. 

•  Determine  the  Ui  of  the  previous  section,  using  the  methods  given  there. 

•  These  Vi  can  be  substituted  into  the  equations  above,  to  give  the  coefficients  of  the 
surface. 

Although  a  simple  implementation  that  successfully  identifies  cubic  surfaces  from  syn¬ 
thetic  outline  data,  exists,  there  are  real  difficulties  in  constructing  an  implementation  of 
this  approach  that  works  in  a  practical  vision  system: 

•  Computing  the  outline  from  image  data  requires  fitting  high  degree  algebraic  curves 
to  edge  points.  The  degree  goes  up  as  the  square  of  the  degree  of  the  surface. 

®The  most  practical  technique  is  simply  to  form  the  commutators  for  the  Lie  algebra  of  the  group 
containing  both  Euclidean  transformations  and  dilations,  as  described  in  [25],  and  then  note  that  the  span 
of  the  set  of  commutators  and  generators  is  the  Lie  algebra  of  the  projective  group. 
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•  Computing  the  contour  generator  from  the  outline  is  tricky,  as  it  requires  finding 
singularities  in  the  outline.  Unfortunately,  a  small  change  in  the  coefficients  of  the 
outline  can  lead  to  substantial  errors  in  the  computed  singularities,  both  in  location 
and  in  multiplicity.  Such  errors  are  guaranteed  by  the  fact  that  we  are  using  a  fitted 
curve.  Furthermore,  the  singularities  must  have  special  properties,  for  the  curve  to 
be  an  outline  at  all.  This  has  advantages  and  disadvantages:  the  curve  can  be  chosen 
from  a  smaller,  more  specialised  class  of  curves,  which  may  make  fitting  more  robust; 
at  the  same  time,  the  curve  produced  by  a  general  fitter  cannot,  in  general,  even  be 
an  outline. 

•  In  practice,  determining  the  surface  from  a  system  of  points  on  the  contour  generator 
involves  a  process  of  fitting  algebraic  surfaces  to  points  in  space,  and  has  the  associated 
instabilities.  Considerable  precision  in  the  points  is  required;  in  the  experiments  on 
synthetic  data,  this  could  be  supplied,  but  it  is  doubtful  whether  sijch  precision  is 
available  in  practical  situations. 

Despite  its  present  impracticality,  this  result  is  valuable,  primarily  because  it  shows  that 
shape  from  outline  is  possible  in  the  context  of  a  very  large  and  interesting  range  of  surfaces, 
and  thereby  opens  several  promising  avenues  of  research: 

•  It  is  hard  to  be  a  contour  generator.  In  the  case  of  algebraic  surfaces,  “most”  curves 
are  not  contour  generators,  because  either  their  degree,  their  genus,  or  the  number  and 
type  of  their  singularities  is  wrong.  There  is  good  reason  to  believe  that  a  similar  result 
must  hold  for  surfaces  drawn  from  a  “small”  parametrised  family  of  smooth  surfaces, 
because  the  range  of  contour  generators  for  a  given  surface  is  so  small,  although  the 
mechanisms  of  proof  and  of  computation  may  be  more  complex.  This  is  the  subject 
of  active  ongoing  research. 

•  It  is  an  example  of  a  recognition  algorithm  that  recognises  an  object  drawn  from 
a  large,  parametrised  world  (generic  algebraic  surfaces  of  degree  three  or  greater) 
without  searching  a  model-base.  If  this  algorithm  is  presented  with  such  a  surface,  it 
can  (in  principle)  immediately  describe  the  surface  up  to  the  intrinsic  ambiguities  of 
the  viewing  geometry.  Given  the  way  the  algorithm  is  framed,  verification  appears  to 
be  either  extremely  difficult  or  impossible,  and  so  the  role  of  the  model-base  becomes 
uncertain. 

•  It  suggests  that  the  global  properties  of  systems  of  contour  generators  are  important 
objects  of  study.  Compare  the  simple,  neat  structure  of  the  family  of  contour  genera¬ 
tors  on  a  projective  algebraic  surface  with  the  extraordinary  complexity  of  its  aspect 
graph;  in  this  case,  the  aspect  graph  is,  in  principle,  redundant,  because  a  single  out¬ 
line  contains  sufficient  information  to  determine  the  entire  surface.  Furthermore,  in 
the  case  of  projective  algebraic  surfaces,  the  system  of  contour  generators  is  one  case 
of  a  well  understood  class  of  objects;  a  linear  system  of  curves  on  a  surface.  A  study 
of  this  system  might  yield  a  much  more  practical  algorithm  for  recognising  a  surface 
from  two  or  three  uncalibrated  views,  with  an  unknown  transformation  between  the 
views,  by  exploiting  the  fact  that  each  outline  represents  a  curve  drawn  from  a  small 
(three-dimensional)  linear  system  of  curves. 
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•  It  opens  a  number  of  curious  geometric  questions;  for  example,  what  is  the  relationship 
between  the  six  cusps  on  the  outline  of  a  cubic  surface,  and  the  surface?  The  contour 
generator  is  obtained  from  the  outline  by  blowing  up  these  six  cusps  on  the  outline; 
but,  by  standard  results,  if  we  were  to  extend  this  blowing  up  process  to  the  whole 
plane,  we  would  obtain  a  surface  passing  through  the  contour  generator,  and  the 
degree  of  this  surface  could  not  be  greater  than  three.  Note  that  this  is  not  a  general 
cubic  surface,  because  the  six  points  blown  up  are  not  in  general  position,  but  appears 
to  be  a  surface  bearing  some  substantial  relationship  to  the  original  cubic  surface. 

Determining  a  class  of  surfaces  that  is  plastic  enough  to  be  useful  for  modelling  a  wide  range 
of  real  objects,  yet  rigid  enough  to  allow  strong  statements  about  the  shape  of  a  particular 
surface  from  a  single  outline,  is  the  central  issue  in  studying  shape  from  contour.  We  have 
shown  that  algebraic  surfaces  represent  one  extreme;  a  very  large  class  of  surface  that  is  so 
rigid  that  an  outline  determines  a  surface.  There  is  room  for  much  future  work. 
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Abstract 

A  recognition  strategy  consisting  of  a  mixture  of  index¬ 
ing  on  invariants  and  search,  allows  objects  to  be  recog- 
nised  up  to  a  Euclidean  ambiguity  with  an  uncalibrated 
camera.  The  approach  works  by  using  projective  invari¬ 
ants  to  determine  all  the  possible  projectively  equivalent 
models  for  a  particular  imaged  object;  then  a  system  of 
global  consistency  constraints  is  used  to  determine  which  of 
these  projectively  equivalent,  but  Euclidean  distinct,  mod¬ 
els  corresponds  to  the  objects  viewed.  These  constraints 
follow  from  properties  of  the  ima^inflf  geometry.  In  par¬ 
ticular,  a  recognition  hypothesis  is  equivalent  to  an  asser¬ 
tion  about,  among  other  things,  viewing  conditions  and  ge¬ 
ometric  relationships  between  objects,  and  these  assertions 
must  be  consistent  for  hypotheses  to  be  correct.  The  ap¬ 
proach  is  demonstrated  to  work  on  images  of  real  scenes 
consisting  of  polygonal  objects  and  polyhedra.  Keywords: 
Recognition,  Computer  Vision,  Invariant  Theory,  Index¬ 
ing,  Model-based  Vision 


1  Introduction 

Many  recent  object  recognition  systems  model  view¬ 
ing  with  an  uncalibrated  camera  or  using  an  uncalibrated 
stereo  pair  as  inducing  either  an  affine  or  a  projective 
transformation  on  figure.  This  approach  allows  invariants 
of  the  appropriate  transformation  to  be  used  to  index  mod¬ 
els  to  produce  a  selection  of  recognition  hypotheses.  These 
hypotheses  are  combined  as  appropriate,  and  the  result  is 
back-projected  into  the  image,  and  verified  by  inspecting 
relationships  between  the  back- projected  outline  and  im¬ 
age  edges  [3,  5,  9,  12,  14].  Indexing  using  projective  in¬ 
variants  has  been  demonstrated  for  plane  objects  and  sim¬ 
ple  polyhedral  objects,  and  has  been  extended  with  vary¬ 
ing  success  to  certain  types  of  surfaces  [1,  6,  8,  13,  15]. 
One  main  disadvantage  of  this  approach  is  that  objects 
are  identified  only  up  to  either  an  affine  or  a  projective 
ambiguity.  This  paper  argues  that  this  ambiguity  is  a  con¬ 
sequence  of  considering  recognition  hypotheses  in  isolation, 
and  is  not  intrinsic  to  the  approach. 

Systems  based  on  indexing  using  projective  invariants 


have  not,  to  date,  been  able  to  distinguish  between  ob¬ 
jects  that  are  projectively  equivalent,  but  not  Euclidean 
equivalent,  because  such  objects  have  the  same  projective 
invariants.  In  this  paper,  we  show  that  a  view  of  two  or 
more  coplanar  objects,  or  polyhedra  is  enough  to  allow  the 
objects  to  be  recognised  up  to  only  a  Euclidean  ambigu¬ 
ity,  if  the  objects  can  be  recognised  at  aU  and  if  Euclidean 
models  are  available. 

1,1  Frames  and  terminology 

Much  of  the  work  we  describe  consists  of  reconciling  dif¬ 
ferent  assertions  about  coordinate  frames.  As  a  result,  the 
discussion  can  become  confusing  without  an  established 
terminology.  The  paper  uses  the  following  terms: 

•  object:  an  actual  thing  in  the  world. 

•  model:  a  collection  of  known  measurements  of  the 
projective  and  Euclidean  geometry  of  an  object, 
which  is  stored  in  the  system.  A  model  could  con¬ 
sist  of  a  mixture  of  points,  lines,  planes,  conics  and 
more  complicated  curves  or  surfaces. 

•  model  frame:  the  frame  of  reference  in  which  the 
model  measurements  are  taken;  the  reference  points 
of  an  object  in  the  world  are  within  a  Euclidean  trans¬ 
formation  of  the  reference  points  in  this  frame. 

•  world  frame:  a  global  frame  of  reference,  in  which 
objects  exist.  If  the  world  consists  of  coplanar  plane 
objects,  then  the  world  frame  is  the  frame  of  reference 
within  this  plane;  otherwise,  the  world  frame  is  three- 
dimensional. 

•  image  frame;  a  frame  of  reference  constructed  in 
the  image  plane,  usually  by  reference  to  the  pixel  po¬ 
sitions  in  the  camera.  For  a  view  of  a  plane  world,  the 
image  frame  is  within  an  unknown  projective  trans¬ 
formation  of  the  world  frame. 

•  Euclidean  transformation:  a  projective  transfor¬ 
mation,  equivalent  to  a  rigid  motion  (rotation  and 
translation),  expressed  in  homogenous  coordinates. 

In  this  paper,  the  relationships  between  frames  are  empha¬ 
sized;  these  relationships  are  usually  determined  by  com¬ 
puting  transformations  between  image  features  and  model 
features.  Such  a  transformation,  although  computed  using 


some  specific  set  of  features,  expresses  the  transformation 
between  the  model  frame  and  the  image  frame. 


2  Plane  objects 


Consider  a  scene  consisting  of  a  set  of  distinct,  coplanar 
plane  objects,  many  of  which  are  represented  in  a  model- 
base.  It  is  well  known  that  any  view  of  this  scene  with 
an  uncaJibrated  camera  can  be  obtained  by  applying  an 
appropriate  plane  projective  transformation  to  the  scene. 
The  goal  of  a  recognition  algorithm  is  from  an  image  of  the 
scene,  label  each  object  correctly  up  to  Euclidean  equiva¬ 
lence. 

Indexing  using  projective  invariants  (as  in  [9])  asso¬ 
ciates  with  each  group  of  image  features  a  collection  of 
object  models  (labels),  which  are  projectively  equivalent, 
but  Euclidean  inequivalent. 

If  only  one  known  object  is  present,  the  task  is  possi¬ 
ble  only  if  there  is  just  one  possible  label  for  that  object. 
If  two  or  more  labels  apply,  the  task  can  be  considered 
in  terms  of  constructing  the  largest  possible  consistent  la¬ 
belling,  because  implicit  in  each  recognition  hypothesis  is 
information  about  the  frame  in  which  the  objects  lie.  This 
information  can  be  formalised  to  obtain  possible  contra¬ 
dictions  between  recognition  hypotheses.  The  details  of 
the  idea  appear  below;  an  example  that  illustrates  the  rea¬ 
soning  appears  in  section  2.1.1. 

2.1  Theory 

Consider  two  coplanar  plane  objects,  oi  and  02y  for 
which  we  have  models  mi  and  m2.  Write  the  transfor¬ 
mation  from  mk  to  Ok  as  Ek,  and  the  transformation 
from  mk  to  the  image  frame  as  P*.  P*  is  Euclidean, 
and  moves  the  model  into  its  position  in  the  world  frame. 
Write  the  projective  transformation  from  the  world  frame 
to  the  image  frame  as  Q.  Then  Pk  =  QEkj  so  that 
=  E^^  E2  which  is  Euclidean. 
Since  labelling  an  image  curve  with  a  particular  model 
name  determines  the  transformation  from  that  model’s 
frame  to  the  image  frame,  the  pairwise  consistency  of  la¬ 
bellings  can  be  checked  by  forming  a  system  of  matrices 
Pj'^P2,  and  checking  whether  they  are  Euclidean. 

Objects  can  consist  of  points,  or  of  some  mixture  of 
points,  lines,  conics,  and  other  curves,  as  long  as  a  pro¬ 
jective  transformation  can  be  computed  from  the  object 
frame  to  the  image  frame.  This  observation  also  justifies 
our  emphasis  on  coordinate  frames,  rather  than  on  par¬ 
ticular  geometric  configurations.  Section  2.2  details  the 
ambiguities  implicit  in  this  scheme. 

If  a  labelling  is  consistent  it  is  possible  to  reconstruct 
the  whole  plane,  in  the  frame  of  one  given  model,  since 
Pi  gives  the  transformation  from  the  configuration  in 
mj’s  frame  to  that  in  mj’s  frame.  To  reconstruct  the  plane 
in,  say,  mi’s  frame,  for  all  mk  compute  P{'^Pk  and  then 


apply  this  map  to  mk]  this  will  give  a  collection  of  objects 
in  mi’s  frame 

2.1.1  Example 

Given  an  image  of  three  objects,  oi,  02  and  03,  which 
are  plane  and  coplanar,  and  which  are  instances  of  known 
models,  the  recognition  system  would  proceed  as  follows: 

1.  Determine  projective  equivalence  classes  by  in¬ 
dexing  the  model-base  using  appropriate  projective 
invariants.  For  each  object,  the  indexing  stage  re¬ 
turns  a  collection  of  possible  Euclidean  models  to 
which  it  might  correspond.  Assume  that  the  re¬ 
sponse  is:  0i  — ►  (mi,m4,m7),  02  — ►  (m2, ms, ms) 
and  03  — ►  (m3,  ms,  mg). 

2.  Determine  all  image-model  transformations 
for  every  image-model  correspondence  using  a  least 
squares  process.  Call  the  transformations  between  Oi 
and  mjj  Pij,  There  is  a  total  of  nine  transformations. 

3.  Test  consistency  between  model  hypotheses  for 
each  pair  of  objects.  Thus,  for  oi  and  02,  form  the 
matrices: 

i'u  -P22.  i’u  i’22,  A'7‘^22,  ^25, 

P^i^Pi^,  PriP2.,  Pr*P2i, 

if,  say,  P^i  P22  is  very  close  to  being  a  Euclidean 
matrix,  then  accept  the  pairing  (oi mi, 02^2).  For 
this  example,  assume  that  the  pairs:  (oimi,02m2), 
(oimi,03m3),  (02*712,03^713),  (oim7,02m8)  are  consis¬ 
tent. 

4.  Form  the  longest  possible  consistent  hypoth¬ 
esis  by  merging  consistent  pairings.  Thus,  in 
this  example,  the  longest  consistent  hypothesis  is 
(oimi,02m2,03m3).  This  is  accepted  as  the  correct 
labelling  for  the  image;  consistency  is  defined  by  en¬ 
suring  that  each  object  has  at  most  one  label,  so  that 
two  pairings  are  consistent  if  they  refer  to  distinct  ob¬ 
jects,  or  if  they  assign  the  same  labels  to  objects  that 
they  share.  The  other  possible  consistent  labelling  is 
(oim7,02m8),  which  is  the  result  of  an  ambiguity. 

2.2  Ambiguities 

An  ambiguous  image  supports  two  or  more  consistent 
labellings  that  are  indistinguishable,  one  of  which  will  be 
correct.  Ambiguities  arise  from  quite  complex  interactions 
between  properties  of  the  image  and  of  the  modelbase; 
some  modelbases  may  not  admit  ambiguities.  We  assume 
that  projectively  distinct  objects  receive  distinct  labels, 
and  study  inherent  ambiguities  in  the  consistency  process. 

Definition:  Two  pairs  of  models,  say  {mi,  m2}, 

{ml,  m2},  admit  an  ambiguous  labelling  if  there 
is  some  image  containing  objects  {oi,  02}  so  that 
{oi mi, 02*712},  and  {oiml,02m2}  are  both  con¬ 
sistent  labellings  of  the  objects. 


Admitting  an  ambiguous  labelling  poses  a  stringent 
constraint  on  the  models  in  the  modelbase.  If  two  pairs 
of  models,  say  {mi, m2},  adniit  an  ambiguous 

labelling,  then  there  exist  projectivities  Put  and  P22*  such 
that  m[  —  Piif mi  and  m^  =  P22'^2 

It  can  be  shown  that,  for  the  configurations  to  be  am¬ 
biguous,  there  are  Euclidean  transformations  Ea^  Eh  such 
that  Piv  =  EaP22'^b>  This  is  an  action  of  two  copies  of 
the  Euclidean  group  on  the  space  of  projective  transforma¬ 
tions,  and  invariants  can  be  obtained  for  this  action.  For 
example,  writing  the  *,  j’th  component  of  a  matrix  Q  as 
qij,  the  expression  (glo +  92i)^/^et(Q)^  is  an  invariant  un¬ 
der  this  action.  This  means  that  this  expression  must  take 
the  same  value  for  Q  =  Pui  as  it  does  for  Q  =  p22^>  Thus, 
for  a  modelbase  to  admit  ambiguities,  it  must  contain  at 
least  two  pairs  of  projectively  equivalent  models,  where  the 
projectivities  between  the  ambiguous  models  have  special 
properties.  In  turn,  this  statement  suggests  that  ambigui¬ 
ties  are  unlikely.  However,  there  is  some  reason  to  believe 
that  modelbases  containing  man-made  objects  are  likely 
to  contain  ambiguities;  for  example,  a  sequence  of  scaled 
versions  of  several  different  objects  will  certainly  give  rise 
to  ambiguities. 

2.3  Implementation  details  and  experi¬ 
ments 

A  system  implementing  the  approach  described  has 
been  demonstrated  on  real  images  of  simple  scenes,  us¬ 
ing  a  stripped-down  version  of  the  system  described  in  de¬ 
tail  in  [9]  to  perform  early  vision  and  fitting.  Indexing, 
though  performed  in  a  reimplementation  of  that  system, 
follows  essentially  the  same  pattern,  but  in  the  present  sys¬ 
tem  a  successful  indexing  attempt  returns  a  collection  of 
Euclidean  models.  To  focus  attention  on  the  Euclidean  la¬ 
belling  properties  of  the  system,  the  model  base  contains 
only  one  projective  equivalence  class  of  models,  consisting 
of  five  projectively  equivalent  but  Euclidean  distinct  mod¬ 
els,  so  that  for  all  known  models  the  projective  invariants 
are  the  same.  The  models  all  consist  of  polygons  with  five 
sides;  objects  were  obtained  by  cutting  these  polygons  out 
of  black  cardboard. 

The  success  of  this  approach  can  be  measured  both  by 
determining  its  effectiveness  in  labelling  the  scene,  and 
by  looking  at  Euclidean  invariants  of  an  unknown  object, 
coplanar  with  the  known  objects  in  the  scene,  and  recon¬ 
structed  using  the  techniques  described;  stability  in  these 
invariants  means  that  the  Euclidean  labelling  was  suffi¬ 
ciently  successful  to  allow  the  Euclidean  structure  of  other 
objects  to  be  determined  from  the  labelling.  Some  results 
are  shown  in  figures  1  and  3. 

3  3D  objects 

The  situation  is  more  difficult  when  the  objects  are 
three-dimensional.  It  is  known  that  the  projective  geom¬ 
etry  of  a  range  of  polyhedra  can  be  recovered  partially  or 
completely  from  a  single  perspective  view  with  an  uncali¬ 
brated  camera  (see  [10,  11]).  In  turn,  projective  invariants 


Figure  1:  Six  examples  of  scenes  containing  known 
coplanar  plane  objects  (five  sides),  and  an  unknown 
object,  imaged  with  a  projective  camera. 
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Figure  2:  The  Euclidean  labels  chosen  by  a  global  con¬ 
sistency  analysis  of  the  corresponding  scenes  in  figure 
1,  superimposed  on  the  backprojected  outlines  of  the 
objects,  which  are  five-sided  plane  polygons.  Although 
the  labellings  are  correct,  the  system  consistently  ig¬ 
nores  one  object  (apparently  as  a  result  of  a  segmen¬ 
tation  difficulty  with  one  poorly  cut  corner). 


Figure  3:  The  graph  shows  the  area  of  the  unknown 
quadrilateral,  measured  in  the  six  images  shown  above 
by  computing  a  backprojection  based  on  the  Euclidean 
recognition  hypotheses;  note  that  the  area  is  relatively 
stable. 

can  be  computed  and  used  to  index  the  polyhedron  in  a 
model-base.  Appropriate  polyhedra  are  position-free  in 
views,  and  tend  to  contain  many  faces  with  four  or  more 
vertices.  However,  for  most  situations,  the  resulting  pro¬ 
jective  ambiguity  is  too  great.  To  proceed,  it  is  necessary 
to  assume  that  that,  for  e<tch  generic  view  of  every  poly¬ 
hedron  in  the  modelbase,  a  distinctive  projective  structure 
can  be  recovered  (details  appear  in  [10]). 

The  perspective  projection  from  3D  projective  space, 
to  the  image  plane,  is  modelled  by  a  3  x  4  projec¬ 
tion  matrix,  P,  so  that 

X  «  PX  (1) 

where  homogeneous  coordinates  are  used,  X  = 
(X,y,Z,  1)S  X  =  (x,y,  1)‘  and  «  indicates  equality  up  to 
a  non-zero  scale  factor.  Following  Hartley  [4],  we  partition 
P  as 

P  =  (M|  -  Mt)  (2) 

where  t  is  the  focal  point  (since  the  focal  point  projects 
as  PX  =  0).  Provided  the  first  3x3  matrix,  M,  is  not 
singular  (i.e.  the  focal  point  is  not  on  the  plane  at  infinity), 
P  can  always  be  partitioned  in  this  way. 

To  determine  the  position  of  the  focal  point  in  the  model 
frame  proceed  as  follows: 

1.  Compute  the  projection  matrix  P  from  the  known 
model  vertices  and  their  corresponding  image  posi¬ 
tions. 

2.  Partition  P  as  above.  This  determines  t,  which  is  the 
focal  point  in  the  object’s  frame. 

3.  The  rays  passing  through  other  image  outline  points 
are  given  as  the  pre-image  in  P  of  the  image  points. 

An  alternative  construction,  due  to  Mohr,  is  also  possi¬ 
ble  [7].  Labelling  an  image  with  a  consistent  Euclidean 
labelling  proceeds  as  follows: 


•  Determine  a  set  of  projectively  equivalent,  Euclidean 
inequivalent  labels  for  each  polyhedron  visible,  using 
the  indexing  methods  of  [10]. 

•  For  each  labelling  of  each  item,  compute  the  focal 
point  and  an  appropriate  cone  of  rays  in  the  object’s 
frame. 

•  Construct  the  largest  pairwise  consistent  labelling  of 
the  scene,  where  pairwise  consistency  is  checked  by 
determining  that  the  focal  point  and  cone  of  rays  con¬ 
structed  by  assuming  one  object,  is  Euclidean  equiva¬ 
lent  to  that  constructed  by  assuming  a  second  object. 

As  in  the  case  of  coplzoiar  plane  objects,  although  a  correct 
labelling  must  be  consistent  in  the  sense  given,  a  consistent 
labelling  may  not  be  correct.  Thus,  for  particular  scenes 
and  particular  model-bases,  a  unique  labelling  may  not,  in 
fact,  be  possible. 

3.1  Ambiguities 

The  range  of  possible  ambiguities  in  the  case  of  poly¬ 
hedral  objects  is  wider  than  in  the  case  of  plane  ob¬ 
jects.  Problems  arise  both  as  a  result  of  viewing  and  self¬ 
occlusion  issues  and  because  the  reconstructed  polyhedra 
do  not  share  the  same  projective  frame.  Ambiguities  re¬ 
sulting  from  self-occlusion  are  not  treated  here,  as  the  na¬ 
ture  of  the  ambiguities  depends  in  a  complicated  way  on 
the  structure  of  the  recognition  system.  If  the  projective 
class  of  the  object  has  been  correctly  recovered,  and  the 
cones  through  the  object  vertices  and  the  focal  point  have 
been  constructed  correctly,  the  following,  tractable  ques¬ 
tion  remains;  given  an  image,  and  the  projective  structure 
of  the  polyhedra  represented  in  that  image,  what  ambigu¬ 
ities  exist  in  the  Euclidean  labelling  process  described? 

There  is  now  a  second  source  of  ambiguity;  many  dis¬ 
tinct  objects  can  produce  the  same  cone  of  rays  through 
the  focal  point.  It  can  be  shown  that,  for  a  modelbase 
to  admit  an  ambiguity,  it  must  contain  four  elements 
Pi)P2jPi,P2y  with  Pi  and  p-  projectively  equivalent,  and 
where  the  transformations  between  the  model  frames  sat¬ 
isfies: 

^ papi  “ 

for  some  arbitrary  translation  T,  elation  D  and  Euclidean 
transformation  E.  Note  that,  since  the  total  number  of 
arbitrary  degrees  of  freedom  is  13,  not  every  pair  of  trans¬ 
formations  Pp^pf ,  Ppip'  will  have  this  property.  As  a  result, 
not  every  modeftjase  admits  an  ambiguity,  and  unambigu¬ 
ous  labellings  appear  possible  for  at  least  some  modelbases. 
Many  man-made  objects  yield  modelbases  that  admit  am¬ 
biguities  (for  example,  a  modelbase  containing  only  cubes 
of  different  sizes). 

3.2  Experiments 

A  system  implementing  the  approach  described  has 
been  demonstrated  on  real  images  of  simple  scenes.  The 
modelbase  contains  five  polyhedral  objects,  all  projectively 
equivalent.  Polygon  vertices  are  marked  in  the  image  by 
hand;  all  further  processing  is  automatic.  Figure  4  shows 
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Figure  4:  Four  examples  of  scenes  containing  known 
polyhedral  objects  all  of  which  are  projectively  equiv¬ 
alent,  imaged  using  a  perspective  camera. 

typical  images;  figure  5  shows  the  Euclidean  labelling  for 
corresponding  images  and  figure  6  shows  a  reconstruction. 
The  reconstruction  techniques  used  at  present  can  produce 
a  reconstruction  that  is  within  an  improper  rotation  (with 
negative  determinant)  of  the  original  world;  this  appears  to 
be  an  intrinsic  ambiguity  in  the  purely  projective  methods 
used,  and  may  be  overcome  by  considering  the  direction 
in  which  the  camera  is  pointing.  The  techniques  do  not 
extend  to  reconstructing  unknown  objects  in  the  way  that 
the  plane  techniques  do. 

4  Discussion 

We  have  shown  methods  for  using  geometric  consistency 
in  2D  from  2D  recognition,  and  in  3D  from  2D  recognition; 
because  the  argument  is  bcised  on  frames  and  maps,  the 
2D  from  2D  argument  carries  over  to  the  3D  from  3D  case 
without  modification  (for  example,  on  the  output  of  “un¬ 
calibrated  stereo”  [2]). 

Linking  these  ideas  is  the  observation  that  recognition 
hypotheses  are  frame  hypotheses.  When  a  program  asserts 
that  some  Euclidean  model  produced  an  image  observa¬ 
tion,  it  is  making  a  statement  about  camera  position  and 
internal  parameters.  Such  statements  lead  to  global  consis¬ 
tency  constraints  that  must  hold.  If,  in  a  world  with  many 
known  objects,  objects  can  be  recognised  effectively  and 
unknown  objects  can  be  reconstructed  up  to  a  Euclidean 
ambiguity  without  camera  calibration,  there  is  no  reason 
to  calibrate  the  camera.  Other  constraints  can  be  applied 
to  the  reconstruction  (for  example,  using  known  objects, 
occlusion  cues  and  up- vector  estimates  to  bound  the  dis¬ 
tance  to  the  object).  This  paper  has  dealt  with  discrete 
modelbases;  the  case  of  parametrised  systems  of  models 
bears  further  investigation. 

More  global  consistency  mechanisms  are  available  -  for 
example,  the  orderly  and  effective  exploitation  of  the  po¬ 
tential  of  t-junctions  to  explain  occlusions.  Many  other 
sources  of  information,  not  necessarily  primarily  geomet¬ 
ric,  could  be  used  to  inform  and  strengthen  recognition 


Figure  5:  Euclidean  labels  chosen  by  a  global  consis¬ 
tency  analysis  of  the  scenes  in  figure  4,  superimposed 
on  the  backprojected  outlines  of  the  objects.  A  label 
“i/j”  means  that  there  is  a  consistent  interpretation 
where  the  object  is  either  object  i  or  object  j.  Incor¬ 
rect  labels  are  enclosed  in  parentheses. 


Figure  6:  A  Euclidean  reconstructions  of  the  polyhe¬ 
dral  world  shown  in  one  image  (bottom  left  image,  all 
labels  correct)  taken  from  figure  4.  The  focal  point 
is  the  marked  point  in  the  top  right-hand  corner  of 
the  figure.  Note  the  distortions  of  the  boxes,  caused 
by  mapping  all  boxes  into  the  frame  of  one  box;  more 
sophisticated  techniques  might  distribute  error  more 
evenly.  The  focal  point  is  included  so  the  reader  can 
assess  the  effectiveness  of  the  reconstruction  by  com¬ 
paring  with  figure  4,  which  is  seen  from  a  different 
viewpoint;  note  that,  for  example,  the  bases  of  all 
boxes  are  near  to  coplanar. 

hypotheses  (and  thereby  shore  up  minor  failures  of  the 
consistency  mechanism). 

Finally,  consistency  mechanisms  of  the  type  described 
are  most  effective  in  worlds  well-populated  with  familiar 
objects.  We  believe  that  the  tremendous  potential  power  of 
a  global  consistency  analysis  will  become  most  important 
in  a  system  with  a  large  modelbase,  operating  in  a  complex 
world. 
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Abstract.  In  many  cases,  the  geometric  representation  that  a  recogni¬ 
tion  system  could  recover  is  insufficient  to  identify  objects.  When  object 
geometry  is  simple,  it  is  not  particularly  distinctive;  however,  a  rich  rep¬ 
resentation  can  be  obtained  by  mapping  the  surface  markings  of  the  ob¬ 
ject  onto  the  geometry  recovered.  If  edges  are  mapped,  a  representation 
that  is  relatively  insensitive  to  the  details  of  lighting  can  be  recovered. 
Mapping  grey  levels  or  color  values  leads  to  a  highly  realistic  graphical 
representation,  which  can  be  used  for  rendering.  The  idea  is  demon¬ 
strated  using  extruded  surfaces,  which  consist  of  a  section  of  a  general 
cone  cut  by  two  planes.  Such  surfaces  possess  a  simple  geometry,  yet 
axe  widespread  in  the  real  world.  The  geometry  of  an  extruded  surface 
is  simple,  and  can  easily  be  recovered  from  a  single  uncalibrated  image. 
We  show  examples  based  on  images  of  real  scenes.  Keywords:  Object 
recognition,  representation,  surface  markings,  invariants. 


1  Introduction 

Efficient  object  recognition  programs  require  distinctive  object  representations; 
in  many  applications,  such  as  surveillance,  video  processing  and  image  databases, 
only  a  single  image  is  available.  Much  work  has  been  done  on  recovering  object 
geometry  from  single,  uncalibrated  images.  This  paper  shows  how  surface  pat¬ 
terns  and  markings  can  be  recovered  and  associated  with  the  geometry  recovered, 
and  demonstrates  a  representation,  that  captures  both  shape  and  pattern,  for  a 
large  class  of  surfaces  covering  many  man-made  objects. 


1.1  Recognition  Using  Indexing 

In  typical  modern  systems,  recognition  proceeds  by: 

-  Feature  extraction:  Edges  are  extracted  and  the  resulting  geometric  prim¬ 
itives  are  grouped  into  likely  object  groups  (for  example,  a  group  of  five  lines, 
of  two  conics,  or  of  a  single  “M-curve”  in  [11]). 

-  Indexing  and  hypothesis  merging:  Feature  groups  are  used  to  compute 
geometric  primitives  that  index  the  object  in  a  model  base  (for  example, 
the  projective  invariants  of  a  pair  of  conics  in  [11]).  Normally,  one  object 
leads  to  several  groups  of  features  indexing  the  appropriate  model,  so  the 
resulting  hypotheses  must  be  merged  using  consistency  criteria  to  obtain  a 
single  recognition  hypothesis. 


—  Verification:  Hypothesized  objects  are  back-projected  into  the  image;  the 
results  are  used  to  obtain  further  information  that  may  confirm  the  hypoth¬ 
esis. 

Systems  of  this  sort  have  been  demonstrated  for  plane  objects  in  a  number 
of  papers  [3,  11,  4,  7,  13,  16].  Typically,  object  models  consist  of  a  system  of 
invariant  values  and  are  therefore  relatively  sparse,  meaning  that  hypothesis 
verification  is  required  to  confirm  a  model  match.  However,  no  searching  of  the 
model  base  is  required  because  the  hypothesised  object’s  identity  is  determined 
by  the  invariant  descriptors  measured.  These  systems  are  attractive  because,  in 
the  ideal  case,  an  object  description  is  computed  from  the  image  and  identifies 
the  object,  without  requiring  that  a  model  base  be  searched.  As  a  result,  systems 
with  relatively  large  model  bases  can  be  constructed^. 


1.2  Recognising  Curved  Surfaces 

The  indexing  systems  described  assume  either  that  the  model  base  contains  only 
plane  or  polyhedral  objects,  or  that  depth  data  is  available.  Recognising  curved 
surfaces  in  single  images  presents  a  particularly  difficult  problem,  as  the  change 
in  appearance  of  a  curved  surface  as  it  is  imaged  from  different  viewing  positions 
is  not  easily  analysed.  Throughout  the  paper,  we  assume  an  idealised  pinhole 
camera.  These  cameras  possess  a  focal  point  and  an  image  plane.  For  each  point 
in  space,  there  is  a  line  through  that  point  and  the  focal  point;  the  point  in  space 
appears  in  the  image  as  the  intersection  of  this  line  with  the  image  plane.  An 
orthographic  view  occurs  when  the  pinhole  is  “at  infinity” . 

If  the  focal  point  is  fixed  and  the  image  plane  is  moved,  the  resulting  distor¬ 
tion  of  the  image  is  a  collineation.  In  what  follows,  it  is  assumed  that  neither 
the  position  of  the  image  plane  with  respect  to  the  focal  point  nor  the  size  and 
aspect  ratio  of  the  pixels  on  the  camera  plane  is  known,  so  that  the  image  pre¬ 
sented  to  the  algorithm  is  within  some  arbitrary  collineation  of  the  “correct” 
image.  In  this  model,  the  image  plane  makes  no  contribution  to  the  geometry, 
and  its  position  in  space  is  ignored. 

The  outline  of  a  surface  is  a  plane  curve  in  the  image,  which  itself  is  the  pro¬ 
jection  of  a  space  curve,  known  as  a  contour  generator"^.  The  contour  generator 
is  given  by  those  points  on  the  surface  where  the  surface  turns  away  from  the 
image  plane;  formally,  the  ray  through  the  focal  point  to  the  surface  is  tangent 
to  the  surface.  As  a  result,  at  an  outline  point,  if  the  relevant  surface  patch  is 
visible,  nearby  pixels  in  the  image  will  see  vastly  different  points  on  the  surface, 
and  so  outline  points  usually  have  sharp  changes  in  image  brightness  associated 
with  them.  Figure  1  illustrates  these  concepts. 

It  is  generally  accepted  that  the  problem  of  recovering  surface  geometry  from 
an  outline  alone  is  intractable  if  the  surface  is  constrained  only  to  be  smooth,  or 

^  Current  systems  using  this  approach  have  model-bases  containing  of  the  order  of 
thirty  objects. 

^  There  are  a  number  of  widely  used  terms  for  both  curves,  and  no  standard  termi¬ 
nology  has  yet  emerged. 


focal  point 


Fig.  1.  The  outline  and  contour  generator  of  a  curved  object,  viewed  from  a  perspective 
camera. 


piecewise  smooth,  as  in  this  case  significant  changes  can  be  made  to  the  surface 
geometry  without  affecting  the  outline  from  a  given  viewpoint.  As  a  result,  an 
important  part  of  the  problem  involves  constructing  as  large  a  class  of  surfaces  as 
possible  that  can  either  be  directly  recognised  or  usefully  constrained  from  their 
outline  alone.  There  have  been  advances  in  a  number  of  special  cases:  [5,  8,  2] 
describe  a  systems  for  recognising  rotationally  symmetric  surfaces  from  outlines, 
[1,  8, 10, 14, 15]  treat  straight  homogenous  generalised  cylinders  and  [6]  describes 
recognising  algebraic  surfaces  from  outlines.  All  these  systems  emphasize  surface 
geometry  in  recognition. 

Other  systems,  such  as  those  of  [12,  9]  emphasize  cues  such  as  color  and 
surface  lightness  over  geometry.  These  cues  are  particularly  effective  in  dealing 
with  objects  whose  geometry  is  ill-defined  or  hard  to  describe  (such  as  shirts 
or  jerseys).  Such  cues  cannot,  however,  be  completely  divorced  from  shape, 
as  by  themselves  they  are  not  particularly  distinctive.  DiflSculties  with  these 
cues  include  perspective  effects  making  it  difficult  to  frame  colour  features  in 
a  truly  shape  independent  fashion  (foreshortening  can  have  substantial  effects 
on  a  colour  histogram)  and  the  profound  sensitivity  of  colour  and  grey-level  to 
illumination. 

2  Recovering  Extruded  Surfaces 

An  extruded  surface  is  a  surface  formed  by  a  section  cut  from  a  general  cone 
by  two  planes  (see  figure  2)  in  such  a  way  that  the  section  of  surface  does  not 
include  the  vertex  of  the  cone.  This  is  the  projective  generalisation  of  the  a 
surface  formed  by  a  system  of  parallel  lines,  with  plane  ends  (such  a  surface  can 
be  extruded  from  a  nozzle).  Surfaces  with  this  geometry  have  been  the  subject 
of  a  number  of  investigations  (see,  for  example,  [10,  14,  15],  whose  more  general 


theories  cover  these  surfaces  as  special  cases),  as  they  are  extremely  common, 
either  in  themselves  or  as  components  of  more  complex  objects  -  examples  include 
most  tin  cans,  boxes,  books,  and  many  plastic  bottles. 


Fig.  2.  A  range  of  examples  of  extruded  surfaces;  note  that  for  most  examples,  the 
vertex  is  at  infinity. 


The  outline  of  an  extruded  surface  consists  of  a  system  of  line  segments 
(possibly  empty  for  some  views),  the  projection  of  one  plane  section,  and  the 
(possibly  occluded)  projection  of  the  other  plane  section;  in  particular,  for  all 
focal  points  outside  a  simple  “forbidden  zone” ,  which  lies  between  the  top  and 
bottom  sectioning  planes  and  to  the  object’s  side  of  their  line  of  intersection,  one 
or  the  other  plane  section  is  completely  visible,  and  there  are  usually  at  least 
two  line  segments  in  the  outline. 

The  projective  geometry  of  an  extruded  surface  can  be  completely  recon¬ 
structed  in  a  single  image  (given  the  focal  point  is  generic),  because  it  is  so 
simple.  Consider  one  of  the  plane  sections,  taken  with  the  line  on  that  plane 
where  the  two  sectioning  planes  intersect;  call  the  plane  chosen  the  defining 
plane  and  the  configuration  of  plane  section  and  line  a  defining  plane  section  or 
D.P.S.  The  complete  projective  geometry  of  the  surface  can  be  represented  by 
a  defining  plane  sections,  as  it  is  possible  can  choose  a  projective  transforma¬ 
tion  of  space  that  fixes  the  D.P.S.,  and  transforms  the  vertex  to  any  given  point 
off  the  defining  plane  section  and  the  other  sectioning  plane,  which  must  pass 
through  the  line  in  the  D.P.S. ,  to  any  given  plane  not  the  defining  plane  nor 
containing  the  vertex.  As  a  result,  the  surface  can  be  reconstructed  by  taking  a 
D.P.S,  choosing  a  second  sectioning  plane  through  the  line  on  the  D.P.S.  arbi¬ 
trarily,  and  choosing  an  arbitrary  vertex  in  space®.  The  section  of  surface  that 


®  Of  course,  the  second  plane  must  not  be  a  second  copy  of  the  defining  plane,  and 
the  vertex  may  not  lie  on  either  plane. 


lies  between  the  planes  and  does  not  include  the  vertex  is  the  reconstruction. 


Projection  into  image  of  intersection  between  top  and  bottom  planes 


of  the  intersection  between  the  two  planes. 


Fig.  3.  Determining  the  projection  into  the  image  of  the  line  of  intersection  of  the  two 
sectioning  planes. 


As  at  least  one  plane  section  is  visible  in  most  images,  it  is  possible  to  re¬ 
cover  the  surface’s  projective  geometry  the  projection  of  the  line  of  intersection 
between  the  planes  in  space  can  be  recovered  in  the  image.  This  can  be  done, 
if  the  projection  of  the  vertex  is  known,  using  the  following  approach  (which  is 
illustrated  in  figure  3): 

-  Mark  three  points,  pi,  P2  and  ps  on  one  plane  section.  These  correspond  to 
the  three  coplanar  points  Pi ,  P2  and  P3  in  space. 

-  Construct  the  corresponding  points  on  the  other  plane  section  (for  objects 
with  a  convex  cross-section,  self-occlusion  is  not,  in  fact,  a  problem  here;  see 
below).  Call  these  points  pi,  pi,  P3,  and  the  corresponding  coplanar  points 
in  space  PI,  P^,  P^- 

-  Intersect  the  image  lines  piP2,  pipi  to  form  qi2  and  the  lines  P2P3,  P2P3  to 
form  923-  Since  Pi,  Pi,  Pj,  P'j  must  be  coplanar  (the  lines  PiPl  Bxid  PjP-  are 
rulings  of  the  surface,  and  hence  pass  through  the  vertex),  the  intersections 
in  the  image  are  projections  of  actual  intersections  in  space  of  the  lines  P1P2, 
P[P^,  etc. 

-  Since  P1P2  lies  on  the  top  plane,  and  Pj'Pj  lies  on  the  bottom  plane,  their 
intersection  must  lie  on  the  intersection  between  the  two  planes.  Thus,  the 
projected  line  of  intersection  must  pass  through  qn  and  through  923- 

The  aspect  of  this  construction  most  likely  to  provide  problems  for  early 
vision  is  the  identification  of  corresponding  points,  which  is  complicated  by  self¬ 
occlusion.  This  problem  is  relatively  easy  to  deal  with  if  the  cross-section  of 


the  cone  is  convex^.  For  such  cross-sections,  which  are  well  represented  in  ap¬ 
plications,  reasoning  from  a  plane  cross-section  of  the  object  is  sufficient.  It  is 
apparent  that  only  three  possible  cases  of  self-occlusion  are  significant: 

1.  one  sectioning  plane  is  occluded,  and  one  is  visible; 

2.  both  sectioning  planes  are  visible; 

3.  both  are  occluded. 

Figure  4  shows  how  each  case  can  occur.  In  case  3,  it  is  possible  to  recover  the 
projective  geometry  of  the  object  only  partially;  as  this  case  is  “unusual”  (from 
the  position  of  the  focal  point),  we  concentrate  on  the  other  two  cases. 


top  face  visible,  bottom  occluded 
in  this  region 


top  sectioning  plane 


Fig.  4.  The  possible  cases  of  occlusion  of  both  top  and  bottom  section  curves  depend 
on  the  position  of  the  focal  point  with  respect  to  the  cone,  and  with  respect  to  the 
top  and  bottom  sectioning  planes;  this  figure  shows  the  possible  cases,  for  an  object 
viewed  in  section.  Reasoning  from  a  section  is  sufficient  if  the  cross-section  is  convex. 


For  a  convex  object,  top-bottom  correspondences  are  also  easily  obtained. 
Consider  a  general  plane  through  the  vertex  and  through  the  focal  point;  such 
a  plane  (which  appears  in  the  image  as  a  line),  and  cuts  the  top  and  bottom 
sections  in  a  number  of  points,  some  or  all  of  which  are  visible  in  the  image  as 
points  along  a  line.  The  cases  are  as  follows: 

-  One  section  visible:  if  (say)  the  bottom  section  is  occluded,  all  but  one 
of  the  points  of  intersection  between  the  given  plane  and  the  bottom  section 
are  occluded  (figure  5).  Hence,  the  number  of  visible  intersections  is  odd, 
and  points  appear  in  a  cyclic  permutation  of  the  order  (ui,U2,U2)  (where  u 

®  For  neatness’  sake,  we  use  a  definition  of  convexity  compatible  with  projective  trans¬ 
formations;  define  a  region  to  be  convex  if  its  intersection  with  any  line  is  connected. 
This  definition  is  broader  than  the  usual  definition,  but  the  property  is  preserved 
under  projective  transformations. 


Imaga  line 

through  vertex 


Case  1 :  One  section  visible,  one  occluded 


Case  2:  Both  sections  visible 


Case  3:  Both  sections  occluded 


Fig.  5.  The  effects  of  occlusion  on  determining  top-bottom  correspondences,  for  a  con¬ 
vex  cross  section  and  the  three  cases  given  in  the  text. 


and  V  are  either  top  or  bottom  respectively,  and  ui  and  Vi  correspond)  along 
the  image  line. 

Both  sections  visible:  if  neither  section  is  occluded,  all  points  of  intersec¬ 
tion  will  be  visible  and  there  will  be  an  even  number  of  intersections  between 
the  outline  and  the  image  line,  which  appear  in  a  cyclic  permutation  of  the 
order  (ui,U2, ^2,^1)  along  the  image  line. 

—  Both  sections  occluded:  if  both  sections  are  occluded,  one  point  of  inter¬ 
section  will  be  visible  for  both  top  and  bottom  sectioning  plane,  and  there 
will  be  two  intersections  between  the  outline  and  the  image  line,  which  clearly 
appear  in  a  cyclic  permutation  of  the  order  {ui,Vi)  along  the  image  line. 


2.1  Obtaining  Surface  Markings  for  Extruded  Surfaces 

To  transfer  surface  markings  from  the  image  to  the  representation  of  the  surface, 
it  is  convenient  to  break  up  the  cross-section  in  the  image  into  a  polygonal  ap¬ 
proximation  (the  polygons  can  be  arbitrarily  small) ,  and  then  note  that  this  leads 
to  a  representation  of  the  surface  as  a  system  of  plane  quadrilaterals.  Finally,  the 
positions  of  the  vertices  in  the  image  to  which  the  3D  vertices  project,  are  known. 
This  determines  the  projective  transformation  from  the  image  quadrilateral  to 
the  surface  quadrilateral  (four  points  and  their  images  determine  a  projective 
transformation),  and  this  transformation  applied  to  pattern  points  maps  them 
onto  the  surface  representation  (figure  6).  Clearly,  if  this  texture  mapping  pro¬ 
cess  works  for  grey-level  surface  markings,  it  will  work  for  color  surface  markings 
as  well.  A  number  of  color  sequences  exist,  although  production  expense  dictates 
that  the  examples  given  show  results  only  for  the  grey  level  case. 


3  Experimental  Results  and  Discussion 

This  scheme  has  been  implemented  for  images  of  real  scenes.  We  use  a  simple 
manual  process  to  mark  outline  points  in  images;  in  particular,  the  operator 
marks  the  two  lines  that  determine  a  vertex,  points  on  the  top  curve,  and  corre¬ 
sponding  points  on  the  bottom  curve.  This  manual  interface  was  used  to  allow 
a  quick  implementation  to  demonstrate  the  recovery  process;  we  do  not  believe 
that  it  is  essential.  The  representations  extracted  are  3D  geometrical  objects  with 
associated  surface  markings,  and  therefore  lend  themselves  well  to  display  as  a 
movie.  Figure  7  shows  typical  images  from  which  models  are  extracted;  figures  8 
and  9  show  frames  from  movies  of  tumbling  objects.  Note  that  the  texture  on  the 
object  is  stable  as  the  object  tumbles,  indicating  that  the  surface  markings  are 
being  correctly  extracted  and  placed;  note  further  that,  if  the  operator  chooses 
a  projective  frame  in  which,  for  example,  the  soda  cans  have  a  circular  cross 
section  and  roughly  the  right  aspect  ratio,  the  models  are  impressively  realistic. 
Figure  10  shows  two  views  of  a  simple  3D  world  constructed  using  a  tool  that 
places  extruded  models  in  space  with  respect  to  one  another. 


3D  representation 


Fig.  6.  Patterns  and  surface  markings  are  transferred  to  the  representation  of  the 
extruded  surface,  using  the  fact  that  four  known  correspondences  determine  a  plane  to 
plane  projection.  These  correspondences  are  determined  by  construction. 


3.1  Recovering  Surface  Markings  in  General 

Ideally,  a  representation  of  a  surface  would  contain  information  both  about  shape 
and  about  surface  pattern  features;  in  general,  edges  in  surface  pattern  are  likely 
to  be  more  effective  recognition  cues  due  to  their  relative  insensitivity  to  illu¬ 
mination,  but  for  some  applications  such  cues  as  pattern  colour  and  lightness 
may  be  appropriate.  The  surface  marking  information  should  be  provided  as  a 
pattern  on  a  representation  of  the  surface  (i.e.  referred  to  its  position  on  the 
surface  in  some  canonical  frame’^),  so  that  foreshortening  effects  and  the  like  can 

^  A  canonical  representation  of  the  object’s  markings  is  one  that  is  independent  of  the 
camera  geometry.  For  instance,  a  canonical  representation  of  a  ruled  surface  such 
as  a  drinks  can  would  be  the  unwrapped  planar  surface  (modulo  starting  point), 
bounded  by  a  rectangle. 


Fig.  7.  Four  typical  images,  from  which  models  were  recovered. 


be  discounted.  If  the  geometry  of  an  object  can  be  recovered  from  an  image, 
then  it  is  generally  also  possible  to  recover  a  representation  that  incorporates 
surface  markings. 

Typically,  processes  that  recover  geometry  from  outline  information  alone  in 
an  uncalibrated  camera  can  only  recover  the  projective  geometry  of  an  object. 
This  means  that  an  object  is  associated  with  a  large  equivalence  class  of  pos¬ 
sible  recovered  representations,  each  within  a  projective  transformation  of  the 
original  object  geometry.  The  property  required,  that  the  surface  markings  be 
in  the  “right  place”  on  the  surface,  is  more  properly  referred  to  as  covariance. 
Covariance  in  recovering  surface  markings  would  mean  that,  for  any  two  projec- 
tively  equivalent  representations  of  an  object  recovered  from  two  images  of  that 
object,  the  projective  equivalence  extends  to  the  surface  markings  as  well. 

It  is  possible  to  produce  algorithms  with  this  property,  given  that  one  allows 
for  occlusion  of  surface  markings.  The  natural  approach,  is  to  compute  a  camera 
matrix  that  takes  the  inferred  object  geometry  to  the  image  outline;  this  matrix 
will  be  a  three  by  four  matrix  (homogenous  coordinates),  whose  kernel  represents 
the  focal  point  of  the  camera,  in  the  model  frame.  The  preimage  of  an  image 
point  in  this  matrix  will  be  a  line  in  space,  passing  through  the  focal  point;  the 
intersection,  closest  to  the  focal  point,  between  this  ray  and  the  object  must  then 
be  marked  with  the  grey-level  or  colour  in  the  camera.  Thus,  transferring  surface 
markings  from  the  image  to  the  representation  involves  an  intersection  process, 
akin  to  ray-tracing.  Because  the  points  obtained  by  this  process  are  defined  by 


Fig.  8.  Six  frames  from  a  motion  sequence,  showing  a  box  for  soda  cans  tumbling. 

intersections,  the  construction  is  covariant. 

The  most  significant  ambiguities  that  will  operate  here  will  be  the  effects  of 
illumination  and  self-occlusion.  The  traditional  solution  to  illumination  problems 
is  to  use  edges,  and  it  is  clear  that  edge  maps  can  be  transferred  to  geometric 
representations  as  well  as  grey-levels.  Self-occlusion  represents  a  more  interesting 
problem  in  determining  feature  properties. 

3.2  Recognition  Using  Surface  Markings 

This  form  of  representation  can  be  used  to  support  a  hierarchical  recognition 
system,  that  uses  both  shape  and  surface  marking  information  to  represent  and 
recognise  objects.  As  the  geometry  of  objects  of  this  form  is  so  simple  (as  we  have 
seen,  it  is  completely  determined  by  a  plane  curve  and  a  line),  objects  can  easily 


Fig.  9.  Six  frames  from  a  motion  sequence,  showing  a  soda  can  tumbling  backwards. 


Fig.  10.  Two  views  of  a  three  dimensional  world  created  and  rendered  using  the  rep¬ 
resentations  described  in  the  text. 


be  indexed  using  a  geometrical  description.  At  that  point,  the  surface  marking 
information,  which  is  in  a  canonical  frame,  can  be  used  to  generate  further  in¬ 
dexing  information  for  the  surfaces  using  the  methods  of,  for  example,  Nayar  and 
Bolle  [9].  A  judicious  use  of  surface  marking  information  is  likely  to  break  the 
projective  ambiguity  implicit  in  the  geometrical  reconstruction  -  for  example,  a 
container  for  cans  of  grape  soda  and  a  matchbox  are  projectively  equivalent,  but 
have  different  markings  on  their  faces.  Figure  11  shows  the  canonical  represen¬ 
tations  of  face  texture  for  three  views  of  three  different  object  faces. 

Geometric  representations  of  extruded  surfaces  can,  therefore,  be  effectively 
texture  mapped  using  image  data.  However,  grey-level  texture  maps  are  ex¬ 
tremely  sensitive  to  illumination  details,  making  them  potentially  ineffective  for 
recognition.  It  is  possible  to  texturemap  the  representation  recovered  with  image 
edges  to  overcome  this  difficulty,  and  figures  12  to  14  show  canonical  represen¬ 
tations  of  face  edges  for  a  range  of  surfaces.  In  general,  these  views  look  the 
same  for  different  views  of  the  same  faces  and  different  for  views  of  different 
faces,  and  so  should  allow  effective  indexing.  In  fact,  constructing  an  indexing 
process  that  works  reliably  for  a  large  number  of  faces  is  not  fully  solved;  much 
of  the  difficulty  appears  to  stem  from  spatial  quantisation  noise,  as  when  a  face 
is  strongly  foreshortened,  an  image  pixel  may  correspond  to  a  large  clump  of 
pixels  in  the  canonical  representation.  Constructing  robust  indexing  techniques 
that  use  the  surface  marking  information  in  these  representations  is  the  subject 
of  active  research. 
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those  of  figures  12  and  13;  note  that  different  views  of  the  same  face  look  the  same, 
and  views  of  different  faces  look  different. 
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Abstract 

Extruded  surfaces  consist  of  a  section  of  general  cone  cut  by  two  planes.  Such  surfaces 
possess  a  simple  geometry,  yet  are  widespread  in  the  real  world.  The  geometry  of  an 
extruded  surface  can  be  recovered  from  a  single  uncalibrated  image.  This  geometric  in¬ 
formation  can  be  fleshed  out  with  the  surface  markings  of  the  object,  leading  to  a  highly 
realistic  3D  representation,  which  can  be  used  for  recognition  or  for  rendering.  We  show 
examples  based  on  images  of  real  scenes. 


1  Introduction 

Efficient  object  recognition  programs  require  distinctive  object  representations;  in  many 
applications,  such  as  surveillance,  video  processing  and  image  databases,  only  a  single  im¬ 
age  is  available.  This  paper  shows  how  to  recover  an  object  representation  that  captures 
both  shape  and  pattern  for  a  large  class  of  surfaces  covering  many  man-made  objects. 


1.1  Recognition  using  indexing 

In  typical  modern  systems,  recognition  proceeds  by: 

•  Feature  extraction:  Edges  are  extracted  and  the  resulting  geometric  primitives 
are  grouped  into  likely  object  groups  (for  example,  a  group  of  five  lines,  of  two 
conics,  or  of  a  single  “M-curve”  in  [11]). 

•  Indexing  and  hypothesis  merging:  Feature  groups  are  used  to  compute  geo¬ 
metric  primitives  to  index  the  object  in  a  model  base  (for  example,  the  projective 
invariants  of  a  pair  of  conics  in  [U])-  Normally,  one  object  leads  to  several  groups 
of  features  indexing  the  appropriate  model,  so  the  resulting  hypotheses  must  be 
merged  using  consistency  criteria  to  obtain  a  single  recognition  hypothesis. 

•  Verification:  Hypothesized  objects  are  back-projected  into  the  image;  the  results 
are  used  to  obtain  further  information  that  may  confirm  the  hypothesis. 

Systems  of  this  sort  have  been  demonstrated  for  plane  objects  in  a  number  of  papers 
[3,  11,  4,  7,  13,  16].  Typically,  object  models  consist  of  a  system  of  invariant  values 
and  are  therefore  relatively  sparse,  meaning  that  hypothesis  verification  is  required  to 
confirm  a  model  match.  However,  no  searching  of  the  model  base  is  required  because 
the  hypothesised  object’s  identity  is  determined  by  the  invariant  descriptors  measured, 
so  that  systems  with  relatively  large  model  bases  can  be  constructed  (currently,  of  the 
order  of  thirty). 
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Figure  1:  The  left  hand  figure  shows  the  outline  and  contour  generator  of  a  curved  object, 
viewed  from  a  perspective  camera.  The  right  hand  figure  shows  a  range  of  examples  of 
extruded  surfaces,  most  with  vertex  at  infinity. 


1.2  Recognising  curved  surfaces 

The  indexing  systems  described  assume  either  that  the  model  base  contains  only  plane 
or  polyhedral  objects,  or  that  depth  data  is  available.  Recognising  curved  surfaces  in 
single  images  presents  a  particularly  difficult  problem,  as  the  change  in  appearance  of 
a  curved  surface  as  it  is  imaged  from  different  viewing  positions  is  not  easily  analysed. 
Throughout  the  paper,  we  assume  an  idealised  pinhole  camera.  These  cameras  possess 
a  focal  point  and  an  image  plane.  For  each  point  in  space,  there  is  a  line  through  that 
point  and  the  focal  point;  the  point  in  space  appears  in  the  image  as  the  intersection  of 
this  line  with  the  image  plane  -  figure  1  illustrates  such  a  camera.  An  orthographic  view 
occurs  when  the  pinhole  is  “at  infinity”. 

If  the  focal  point  is  fixed  and  the  image  plane  is  moved,  the  resulting  distortion  of 
the  image  is  a  collineation.  In  what  follows,  it  is  assumed  that  neither  the  position  of  the 
image  plane  with  respect  to  the  focal  point  nor  the  size  and  aspect  ratio  of  the  pixels  on 
the  camera  plane  is  known,  so  that  the  image  presented  to  the  algorithm  is  within  some 
arbitrary  collineation  of  the  “correct”  image.  In  this  model,  the  image  plane  makes  no 
contribution  to  the  geometry,  and  its  position  in  space  is  ignored. 

The  outline  of  a  surface  is  a  plane  curve  in  the  image,  which  itself  is  the  projection 
of  a  space  curve,  known  cis  a  contour  generator.  The  contour  generator  is  given  by  those 
points  on  the  surface  where  the  surface  turns  away  from  the  image  plane;  formally,  the 
ray  through  the  focal  point  to  the  surface  is  tangent  to  the  surface.  As  a  result,  at  an 
outline  point,  if  the  relevant  surface  patch  is  visible,  nearby  pixels  in  the  image  will  see 
vastly  different  points  on  the  surface,  and  so  outline  points  usually  have  sharp  changes 
in  image  brightness  associated  with  them.  Figure  1  illustrates  these  concepts. 

It  is  generally  accepted  that  the  problem  of  recovering  surface  geometry  from  an 
outline  alone  is  intractable  if  the  surface  is  constrained  only  to  be  smooth,  or  piecewise 
smooth,  as  in  this  case  significant  changes  can  be  made  to  the  surface  geometry  without 
affecting  the  outline  from  a  given  viewpoint.  As  a  result,  an  important  part  of  the  problem 
involves  constructing  as  large  a  class  of  surfaces  as  possible  that  can  either  be  directly 
recognised  or  usefully  constrained  from  their  outline  alone.  There  have  been  advances  in  a 
number  of  special  cases:  [5,  8,  2]  describe  a  systems  for  recognising  rotationally  symmetric 


surfaces  from  outlines,  [1,  8,  9,  14,  15]  treat  straight  homogenous  generalised  cylinders  and 
[6]  describes  recognising  algebraic  surfaces  from  outlines.  All  these  systems  emphasize 
surface  geometry  in  recognition. 

Other  systems,  such  as  those  of  [12,  10]  emphasize  cues  such  as  color  and  surface  light¬ 
ness  over  geometry.  These  cues  are  particularly  effective  in  dealing  with  objects  whose 
geometry  is  ill-defined  or  hard  to  describe  (such  as  shirts  or  jerseys).  Such  cues  cannot, 
however,  be  completely  divorced  from  shape,  as  by  themselves  they  are  not  particularly 
distinctive.  Furthermore,  perspective  effects  make  it  difficult  to  frame  colour  features 
in  a  truly  shape  independent  fashion;  for  example,  foreshortening  can  have  substantial 
effects  on  a  colour  histogram. 

Ideally,  a  representation  of  a  surface  would  contain  information  both  about  shape  and 
about  surface  pattern  features,  such  as  colour  and  surface  lightness.  The  colour  informa¬ 
tion  should  be  provided  as  a  pattern  on  a  representation  of  the  surface  (i.e.  referred  to  its 
position  on  the  surface  in  some  canonical  frame),  so  that  foreshortening  effects  and  the 
like  can  be  discounted.  This  paper  shows  how  to  construct  such  a  representation  from  a 
single  image  of  an  extruded  surface. 

2  Recovering  extruded  surfaces 

An  extruded  surface  is  a  surface  formed  by  a  section  cut  from  a  general  cone  by  two  planes 
(see  figure  1)  in  such  a  way  that  the  section  of  surface  does  not  include  the  vertex  of  the 
cone.  This  is  the  projective  generalisation  of  the  a  surface  formed  by  a  system  of  parallel 
lines,  with  plane  ends  (such  a  surface  can  be  extruded  from  a  nozzle).  Extruded  surfaces, 
and  surfaces  made  up  from  extruded  components,  are  extremely  common  -  examples 
include  most  tin  cans,  boxes,  books,  and  many  plastic  bottles. 

The  outline  of  an  extruded  surface  consists  of  a  system  of  line  segments  (possibly 
empty  for  some  views),  the  projection  of  one  plane  section,  and  the  (possibly  occluded) 
projection  of  the  other  plane  section;  in  particular,  for  all  focal  points  outside  a  simple 
“forbidden  zone”,  which  lies  between  the  top  and  bottom  sectioning  planes  and  to  the 
object’s  side  of  their  line  of  intersection,  one  or  the  other  plane  section  is  completely 
visible,  and  there  are  usually  at  least  two  line  segments  in  the  outline. 

The  geometry  of  an  extruded  surface  can  be  completely  reconstructed  in  a  single 
image  (given  the  focal  point  is  generic),  because  it  is  so  simple.  Consider  one  of  the  plane 
sections,  taken  with  the  line  on  that  plane  where  the  two  sectioning  planes  intersect;  call 
the  plane  chosen  the  defining  plane  and  the  configuration  of  plane  section  and  line  a 
defining  plane  section  or  D.P.S.  The  complete  projective  geometry  of  the  surface  can 
be  represented  by  a  defining  plane  sections,  as  it  is  possible  can  choose  a  projective 
transformation  of  space  that  fixes  the  D.P.S. ,  and  transforms  the  vertex  to  any  given 
point  off  the  defining  plane  section  and  the  other  sectioning  plane,  which  must  pass 
through  the  line  in  the  D.P.S.,  to  any  given  plane  not  the  defining  plane  nor  containing 
the  vertex.  As  a  result,  the  surface  can  be  reconstructed  by  taking  a  D.P.S,  choosing 
a  second  sectioning  plane  through  the  line  on  the  D.P.S.  arbitrarily,  and  choosing  an 
arbitrary  vertex  in  space.  The  section  of  surface  that  lies  between  the  planes  and  does 
not  include  the  vertex  is  the  reconstruction. 

As  at  least  one  plane  section  is  visible  in  most  images,  it  is  possible  to  recover  the 
surface’s  projective  geometry  the  projection  of  the  line  of  intersection  between  the  planes 
in  space  can  be  recovered  in  the  image.  This  can  be  done,  if  the  projection  of  the  vertex 
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Figure  2:  Determining  the  projection  into  the  image  of  the  line  of  intersection  of  the  two 
sectioning  planes. 

is  known,  using  the  following  approach  (which  is  illustrated  in  figure  2): 

•  Mark  three  points,  pi,  p2  and  ps  on  one  plane  section.  These  correspond  to  the 
three  coplanar  points  Pi,  Pj  and  P3  in  space. 

•  Construct  the  corresponding  points  on  the  other  plane  section  (for  objects  with  a 
convex  cross-section,  self-occlusion  is  not,  in  fact,  a  problem  here;  see  below).  Call 
these  points  p\,  p'2,  P3,  and  the  corresponding  coplanar  points  in  space  P(,  P2,  P3. 

•  Intersect  the  image  lines  P1P2,  P1P2  to  form  q\2  and  the  lines  P2P3,  P2P3  to  form 
^23-  Since  P,,  P/,  P,,  Pj  must  be  coplanar  (the  lines  pP/  and  PjPj  are  rulings  of 
the  surface,  and  hence  pass  through  the  vertex),  the  intersections  in  the  image  are 
projections  of  actual  intersections  in  space  of  the  lines  P1P2,  P/Pjj  otc. 

•  Since  P1P2  lies  on  the  top  plane,  and  P{P2  lies  on  the  bottom  plane,  their  intersec¬ 
tion  must  lie  on  the  intersection  between  the  two  planes.  Thus,  the  projected  line 
of  intersection  must  pass  through  qu  and  through  923- 

The  aspect  of  this  construction  most  likely  to  provide  problems  for  early  vision  is 
the  identification  of  corresponding  points,  which  is  complicated  by  self-occlusion.  This 
problem  is  relatively  easy  to  deal  with  if  the  cross-section  of  the  cone  is  convex.  For 
such  cross-sections,  which  are  well  represented  in  applications,  reasoning  from  a  plane 
cross-section  of  the  object  is  sufficient.  It  is  apparent  that  only  three  possible  cases  of 
self-occlusion  are  significant: 

1.  one  sectioning  plane  is  occluded,  and  one  is  visible; 

2.  both  sectioning  planes  are  visible; 

3.  both  are  occluded. 

Figure  3  shows  how  each  case  can  occur.  In  case  3,  it  is  possible  to  recover  the  projective 
geometry  of  the  object  only  partially;  as  this  case  is  “unusual”  (from  the  position  of  the 
focal  point),  we  concentrate  on  the  other  two  cases. 

For  a  convex  object,  top-bottom  correspondences  are  also  easily  obtained.  Consider 
a  general  plane  through  the  vertex  and  through  the  focal  point;  such  a  plane  (which 
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Figure  3:  The  left  hand  figure  shows  how  the  possible  cases  of  occlusion  of  both  top 
and  bottom  section  curves  depend  on  the  position  of  the  focal  point  with  respect  to  the 
cone,  and  with  respect  to  the  top  and  bottom  sectioning  planes;  this  figure  shows  the 
possible  cases,  for  an  object  viewed  in  section.  Reasoning  from  a  section  is  sufficient 
if  the  cross-section  is  convex.  The  right  hand  figure  shows  the  effects  of  occlusion  on 
determining  top-bottom  correspondences,  for  a  convex  cross  section  and  the  three  cases 
given  in  the  text. 

appears  in  the  image  as  a  line),  and  cuts  the  top  and  bottom  sections  in  a  number  of 
points,  some  or  all  of  which  are  visible  in  the  image  as  points  along  a  line.  The  cases  are 
as  follows: 

•  One  section  visible:  if  (say)  the  bottom  section  is  occluded,  all  but  one  of  the 
points  of  intersection  between  the  given  plane  and  the  bottom  section  are  occluded 
(figure  3).  Hence,  the  number  of  visible  intersections  is  odd,  and  points  appear  in  a 
cyclic  permutation  of  the  order  (ui,  ^2?  ^2)  (where  u  and  v  are  either  top  or  bottom 
respectively,  and  u,  and  u,  correspond)  along  the  image  line. 

•  Both  sections  visible:  if  neither  section  is  occluded,  all  points  of  intersection  will 
be  visible  and  there  will  be  an  even  number  of  intersections  between  the  outline 
and  the  image  line,  which  appear  in  a  cyclic  permutation  of  the  order  (ui,  U2,  U2,  Ui) 
along  the  image  line. 

•  Both  sections  occluded:  if  both  sections  are  occluded,  one  point  of  intersection 
will  be  visible  for  both  top  and  bottom  sectioning  plane,  and  there  will  be  two 
intersections  between  the  outline  and  the  image  line,  which  clearly  appear  in  a 
cyclic  permutation  of  the  order  (ui,yi)  along  the  image  line. 

In  practice,  reconstruction  proceeds  as  follows: 

1.  The  cross-section  of  the  cone  is  determined,  by  finding  the  image  curve  correspond¬ 
ing  to  the  intersection  of  the  cone  and  the  top  plane.  This  curve  gives  a  plane 
curve  within  a  projective  transformation  of  the  original  section,  and  is  projected 
into  some  arbitrarily  chosen  plane  in  space 

2.  The  projection  of  the  vertex  is  determined  in  the  image,  by  intersecting  line  seg¬ 
ments  in  the  outline.  An  arbitrary  vertex  is  chosen  in  space. 
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Figure  4:  Patterns  and  surface  markings  are  transferred  to  the  representation  of  the 
extruded  surface,  using  the  fact  that  four  known  correspondences  determine  a  plane  to 
plane  projection.  These  correspondences  are  determined  by  construction. 

3.  The  image  line  that  is  the  projection  of  the  line  where  the  top  and  bottom  planes 
intersect,  is  determined  (figure  2,  and  above),  and  projected  onto  the  same  plane 
as  the  cross-section,  ensuring  that  the  configuration  of  line  and  outline  is  projected 
as  one. 

4.  The  cone  is  constructed  by  constructing  lines  through  the  vertex  and  points  on  the 
plane  cross-section. 

5.  The  second  plane  is  chosen;  any  plane  passing  through  the  line  of  intersection, 
and  not  through  the  vertex,  will  do.  The  lines  passing  through  points  on  the  top 
cross-section  and  the  vertex  mark  out  the  bottom  cross-section. 

2.1  Recovering  surface  markings 

To  transfer  surface  markings  from  the  image  to  the  representation  of  the  surface,  it  is 
convenient  to  break  up  the  cross-section  in  the  image  into  a  polygonal  approximation 
(the  polygons  can  be  arbitrarily  small),  and  then  note  that  this  leads  to  a  representation 
of  the  surface  as  a  system  of  plane  quadrilaterals.  Finally,  the  positions  of  the  vertices  in 
the  image  to  which  the  3D  vertices  project,  are  known.  This  determines  the  projective 
transformation  from  the  image  quadrilateral  to  the  surface  quadrilateral  (four  points  and 
their  images  determine  a  projective  transformation),  and  this  transformation  applied  to 
pattern  points  maps  them  onto  the  surface  representation  (figure  4).  Clearly,  if  this 
texture  mapping  process  works  for  grey-level  surface  markings,  it  will  work  for  color 
surface  markings  as  well.  A  number  of  color  sequences  exist,  although  production  expense 
dictates  that  the  examples  given  show  results  only  for  the  grey  level  case. 


3  Experimental  results  and  discussion 

This  scheme  has  been  implemented  for  images  of  real  scenes.  We  use  a  simple  manual 
process  to  mark  outline  points  in  images;  in  particular,  the  operator  marks  the  two  lines 
that  determine  a  vertex,  points  on  the  top  curve,  and  corresponding  points  on  the  bottom 


Figure  5:  Four  typical  images,  from  which  models  were  recovered. 


Figure  6:  Frames  from  four  motion  sequences,  created  from  the  representations  recovered 
using  the  techniques  described  in  the  text,  showing  boxes  for  soda  cans  tumbling,  and  a 
soda  can  tumbling.  Note  that  the  faces  not  visible  in  the  image  used  for  making  models 
are  rendered  blank;  constructing  an  integrated  representation  from  a  number  of  views  of 
an  object  is  an  interesting  question,  at  present  open.  Objects  are  scaled  by  hand,  and 
rendered  into  a  canonical  frame.  Note  also  the  effects  of  image  spatial  quantization  on 
one  face  of  the  box  (which  was  severely  foreshortened  in  the  original  image. 


curve.  This  manual  interface  was  used  to  allow  a  quick  implementation  to  demonstrate 
the  recovery  process;  we  do  not  believe  that  it  is  essential.  The  representations  extracted 
are  3D  geometrical  objects  with  associated  surface  markings,  and  therefore  lend  them¬ 
selves  well  to  display  as  a  movie.  Figure  5  shows  typical  images  from  which  models  are 
extracted;  figure  6  shows  frames  from  movies  of  tumbling  objects.  Note  that  the  texture 
on  the  object  is  stable  as  the  figure  tumbles,  indicating  that  the  surface  markings  are 
being  correctly  extracted  and  placed;  note  further  that,  if  the  operator  chooses  a  projec¬ 
tive  frame  in  which,  for  example,  the  soda  cans  have  a  circular  cross  section  and  roughly 
the  right  aspect  ratio,  the  models  are  impressively  realistic.  Figure  7  shows  two  views  of 
a  simple  3D  world  constructed  using  a  tool  that  places  extruded  models  in  space  with 
respect  to  one  another. 

This  form  of  representation  can  be  used  to  support  a  hierarchical  recognition  system, 
that  uses  both  shape  and  surface  marking  information  to  represent  and  recognise  objects. 
As  the  geometry  of  objects  of  this  form  is  so  simple  (as  we  have  seen,  it  is  completely 
determined  by  a  plane  curve  and  a  line),  objects  can  easily  be  indexed  using  a  geometrical 


Figure  7:  Two  views  of  a  three  dimensional  world  created  and  rendered  using  the  repre¬ 
sentations  described  in  the  text. 

description.  At  that  point,  the  surface  marking  information,  which  is  in  a  canonical  frame, 
can  be  used  to  generate  further  indexing  information  for  the  surfaces  using  the  methods 
of,  for  example,  Nayar  and  Bolle  [10].  A  judicious  use  of  surface  marking  information  is 
likely  to  break  the  projective  ambiguity  implicit  in  the  geometrical  reconstruction  -  for 
example,  a  container  for  cans  of  grape  soda  and  a  matchbox  are  projectively  equivalent, 
but  have  diflPerent  markings  on  their  faces.  Figures  8  show  the  canonical  representations 
of  face  texture  for  three  views  of  three  different  object  faces.  In  general,  these  views 
look  the  same  for  different  views  of  the  same  faces  and  different  for  views  of  different 
faces,  and  so  should  allow  effective  indexing.  In  fact,  constructing  an  indexing  process 
that  works  reliably  for  a  large  number  of  faces  is  not  fully  solved;  much  of  the  difficulty 
appears  to  stem  from  spatial  quantisation  noise,  as  when  a  face  is  strongly  foreshortened, 
an  image  pixel  may  correspond  to  a  large  clump  of  pixels  in  the  canonical  representation. 
Constructing  robust  indexing  techniques  that  use  the  surface  marking  information  in 
these  representations  is  the  subject  of  active  research. 
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Abstract.  Recognising  3D  objects  from  single  images  presents  a  range 
of  significant  problems,  mostly  to  do  with  the  nature  and  distinctiveness 
of  the  representations  that  can  be  recovered.  In  such  special  cases  as 
polyhedra,  surfaces  of  revolution,  general  cones,  canal  surfaces  and  al¬ 
gebraic  surfaces,  the  geometry  can  be  recovered  with  varying  degrees  of 
success  and  of  ambiguity.  We  discuss  these  volumetric  primitives,  com¬ 
paring  their  utility  to  that  of  surface  primitives. 

For  a  model  based  recognition  system,  representation  is  not  simply  con¬ 
cerned  with  particular  geometric  primitives,  but  the  entire  recognition 
process.  In  our  view,  representations  should  be  motivated  by  the  way 
quantities  that  are  measurable  in  an  image  influence  decisions  through¬ 
out  the  recognition  process. 

When  some  geometric  information  is  available,  its  potential  distinctive¬ 
ness  can  often  be  substantially  enhanced  by  constructing  representations 
that  capture  surface  markings  in  an  appropriate  frame  on  the  surface  it¬ 
self.  .  .  .  .r-u 

Keywords;  Object  Recognition,  Computer  Vision,  Invariant  Theory, 
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1  Introduction 

The  fundamental  question  in  discussing  representations  is  “What  is  the  repre¬ 
sentation  for?” .  Representations,  such  as  depth  maps  and  Bezier  patches,  that 
may  be  appropriate  for  tasks  such  as  geometric  modelling,  graphics,  navigation, 
or  grasping  may  not  be  appropriate  for  model  based  recognition.  In  this  paper 
we  explore  representations  for  object  recognition,  with  a  particular  emphasis  on 
representations  for  curved,  3D  objects  that  can  be  extracted  from  a  single  image. 

For  recognition  systems  to  be  successful  and  useful,  they  will  have  to  have 
large  modelbases,  containing  a  wide  variety  of  objects.  For  systems  with  large 
modelbases,  searching  over  object-image  correspondences  to  estimate  pose  and 
then  verify,  is  impractically  expensive.  Existing  recognition  schemes  where  recog¬ 
nition  is  phrased  in  this  way  include  pose-clustering  [35],  alignment  [16,  17],  and 
interpretation  trees  [15].  Fortunately,  object  recognition  is  not  pose  recovery,  rep¬ 
resentations  can  be  defined  which  yield  viewpoint-invariant  descriptions  from 
images.  Such  invariant  descriptors  directly  identify  models,  without  first  com¬ 
puting  pose,  avoiding  searching  the  model  base.  As  a  result,  defining  and  manag¬ 
ing  appropriate  object  representations  is  the  central  question  in  building  object 
recognition  systems. 


If  the  representation  is  too  impoverished  then  it  will  not  admit  object- 
independent  constructions  that  take  image  data  alone  and  yield  object  identity, 
as  recent  papers  have  shown  [4,  5,  22].  A  number  of  the  pose  based  schemes  cited 
above  are  forced  to  search  their  model  libary  because  the  motivating  notion  of 
objects,  as  semi-coherent  clouds  of  points  or  line-segments,  is  too  parsimonious: 
clouds  of  points  are  a  poor  representation  for  most  objects,  because  representing 
an  object  as  a  cloud  of  points  wastes  the  available  and  potentially  rich  structure 
of  outline  and  of  markings. 

It  is  convenient  to  distinguish  between  two  basic  types  of  representation: 

-  Volumetric  primitives^  which  are  drawn  from  globally  constrained  geometries 
such  as  generalized  cylinders  and  algebraic  surfaces,  and  can  represent,  for 
example,  vases,  cans  and  boxes.  A  great  deal  is  known  about  extracting 
invariant  representations  of  volumetric  primitives. 

~  Surface  primitives^  which  consist  of  patches  of  surface,  defined  by  their  cur¬ 
vatures  for  example,  and  subject  to  little  or  no  constraint  -  they  represent 
shirts,  Henry  Moore  sculptures,  and  trees.  Very  little  is  known  about  what 
information  should  be  extracted  or  what  is  available. 

Typically,  representations  should  (and  occasionally  do)  include  information 
both  about  the  geometry  of  objects,  and  about  the  markings  that  lie  on  the 
objects.  Geometry  and  markings  are,  to  some  extent,  interdependent;  measure¬ 
ments  of  markings  are  unreliable  unless  referred  to  some  coordinate  system  on 
a  curved  surface  itself  (to  avoid  the  effects  of  foreshortening),  and  markings 
generate  image  clutter  that  obstructs  grouping.  Admitting  markings  is  essen¬ 
tial,  because  it  expands  the  scope  of  any  representational  scheme  -  it  is  hard  to 
distinguish  between  soft-drink  cans  without  exploiting  marking  information. 

In  what  follows,  we  use  the  following  terms: 

-  markings:  patterns  of  reflectance  changes  on  a  surface  -  for  example,  the 
patterns  on  the  cover  of  a  book. 

-  texture:  markings  that  are  more  structured  or  statistical  patterns  of  re¬ 
flectance  changes  -  for  example,  the  patterns  on  a  wooden  desktop. 

-  outline:  the  points  in  an  image  plane  where  the  surface  is  tangent  to  the 
ray  through  the  focal  point  and  the  image  point. 

-  contour  generator:  the  points  on  the  surface  that  project  down  to  the 
outline. 

2  What  is  known 

2.1  Indexing 

A  system  that  must  handle  large  numbers  of  models  requires  indexing,  where 
a  process  that  is  wholly  or  largely  model-independent,  is  used  to  compute  rep¬ 
resentations  that  are  largely  or  wholly  unaffected  by  the  position  and  intrinsic 
parameters  of  the  camera,  and  that  differ  from  object  to  object.  These  descrip¬ 
tions,  often  known  as  indexing  functions,  have  the  same  value  for  any  view  of 


a  given  object,  and  so  can  be  used  to  index  into  a  model  base  without  search 
(for  example,  [5,  7,  18,  28,  29,  31,  34,  39,  41]).  Note  that  schemes  where  the 
model  must  be  known  to  compute  an  index  (e.g.  [40])  do  not  escape  searching 
the  modelbase,  and  cannot  be  seen  as  solving  the  indexing  problem  in  a  useful 
way. 

Invariant  representations  are  a  natural  goal,  and  have  a  long  tradition  in 
vision.  The  evolution  of  technique  in  representation  in  the  last  decade  or  so 
can  partly  be  charted  by  the  extent  to  which  the  representation  is  invariant:  the 
tangent  angle  vs.  arc  length  representation  [1]  is  invariant  under  trans¬ 

formations,  which  correspond  to  presenting  an  object  in  a  fronto-parallel  plane  - 
a  very  restricted  viewing  geometry 5  later,  Lamdan  6t  qI.  introduced  invari¬ 

ant  representations,  corresponding  to  parallel  projection  with  unknown  intrinsic 
parameters  [18];  and  then  projectively  invariant  representations  [7]  covered  the 
most  general  transformation  between  a  planar  object  and  image,  with  no  re¬ 
strictions  on  pose  or  intrinsic  parameters.  Indexing  using  invariants  yields  an 
attractively  simple  architecture:  in  a  typical  system  that  works  for  plane  ob¬ 
jects,  projective  invariants  are  computed  for  a  range  of  geometric  primitives  in 
the  image;  if  the  values  of  these  invariants  match  the  values  of  the  invariants 
for  a  known  model,  we  have  good  evidence  that  the  image  features  are  within 
a  camera  transformation  of  the  model  features,  and  that  hypothesis  can  be  ei¬ 
ther  combined  with  other  hypotheses,  or  verified  directly.  The  efficiency  of  this 
indexing  process  means  that  systems  with  non-trivial  sizes  of  model  base  can  be 
constructed^ . 

It  must  be  possible  to  extract  indexing  functions  from  images  under  realistic 
conditions.  For  this  reason  they  cannot  be  too  local,  as  this  compels  the  use 
of  high  derivatives  which  cannot  be  measured  locally,  or  too  global,  and  thus 
dependent  on  all  features  being  present  and  grouped.  There  has  been  a  trend, 
analogous  to  the  invariance  of  representations  described  above,  of  moving  away 
from  global  descriptions  such  as  moments  and  Fourier  descriptors  (both  which 
are  hopelessly  inadequate  when  features  are  missing  -  either  due  to  occlusion, 
or  to  the  ever-present  problem  of  “drop  out”  with  feature  detectors)  through  to 
modern  semi-local  descriptors  (e.g.  [37]). 


2.2  Volumetric  primitives 

In  the  computer  vision  literature,  there  is  a  long  tradition  in  the  use  of  volumetric 
primitives,  generalized  cylinders  being  the  strongest  current.  Knowing  that  one 
is  looking  at  a  part  of  a  volumetric  primitive  characterized  by  a  small  number  of 
parameters  is  a  very  powerful  constraint.  An  analogy  with  statistical  inference  is 
appropriate-if  the  data  can  be  well  modeled  as  arising  from  a  parametric  model 
of  one  of  the  standard  types-the  right  strategy  is  to  use  the  data  to  estimate 
the  (small  number  of)  model  parameters  and  then  use  the  model  to  answer  the 
various  questions  one  might  have. 

^  Current  systems  using  indexing  functions  have  model-bases  containing  of  the  order 
of  thirty  objects. 


In  contexts  when  the  objects  are  well  modeled  by  volumetric  primitives,  not 
much  benefit  is  derived  from  aspect  graphs.  Given  that  there  exists  a  model  with 
a  small  number  of  parameters,  the  multiple  views  are  redundant. 

Of  course,  just  as  parametric  statistics  is  appropriate  only  for  a  subset  of  real 
world  situations,  simple  volumetric  primitives  are  not  general  enough  to  deal 
with  the  wide  variety  of  man-made  and  natural  objects  -  crumpled  newspaper 
is  one  good  example. 

One  approach  to  indexing  a  3D  object  would  be  to  determine  a  comprehen¬ 
sive  reconstruction  of  its  3D  geometry,  and  then  abstract  geometric  invariants 
from  that  representation;  in  general,  this  is  not  possible.  In  some  cases,  for  exam¬ 
ple,  surfaces  of  revolution,  complete  reconstructions  are  definitely  unavailable  in 
a  single  image  (some  parameters  are  unresolved).  However,  for  a  number  of  vol¬ 
umetric  primitives  it  is  possible  to  extract  index  functions,  from  a  single  image, 
which  measure  some  of  the  object^s  3D  geometric  properties: 

-  Polyhedra:  For  polyhedra  that  are  position  free,  and  have  “many”  quadri¬ 
lateral  faces,  projective  invariants  can  be  recovered  from  a  single  image  ([30], 
after  [32]);  these  invariants  are  measurable  because  of  the  rich  incidence 
structure  of  these  polyhedra, 

-  Repeated  structures:  A  single  view  of  an  object  with  an  n-fold  reflectional 
symmetry  is  equivalent  to  n  views  of  a  section  of  that  object;  for  polyhedral 
objects  and  space  curves,  this  has  been  used  to  reconstruct  the  projective 
geometry  of  such  objects  [24,  30]. 

—  Surfaces  of  revolution  (SOR):  Cross-ratio’s  of  a  series  of  points,  defined 
by  outline  bitangent  lines,  on  the  imaged  axis  of  a  SOR,  yield  projective 
invariants  of  the  surface  [9,  19].  A  further  construction  allows  outlines  to 
be  transferred  from  view  to  view  [36],  but  the  information  available  in  the 
image  is  insufficient  to  reconstruct  the  surface  (two  further  parameters  are 
required). 

-  Straight  homogeneous  generalised  cylinders:  It  is  possible  to  recon¬ 
struct  some  SHGC’s  from  image  information  alone  [42];  the  ambiguity  in  the 
reconstruction  is  not  specified,  but  appears  to  be  at  least  an  elation  through 
the  focal  point, 

—  Algebraic  surfaces:  The  complete  projective  geometry  of  a  generic  alge¬ 
braic  surface  can  be  recovered  from  its  outline  in  a  single  view  through  a 
generic  focal  point;  the  algorithm  given  is  too  complex  for  practical  use  [6]. 

—  Canal  surfaces:  A  canal  surface  is  a  generalised  cylinder  where  the  swept 
curve  is  a  circle  of  constant  radius,  with  the  plane  of  the  circle  orthogonal  to 
the  axis  curve.  In  the  case  of  a  planar  axis,  the  axis  curve  can  be  recovered, 
from  the  outline  alone,  up  to  an  affine  ambiguity  [26].  Consequently,  affine 
invariants  of  the  axis  curve  can  be  used  as  index  functions,  but  the  surface 
can  only  be  recovered  up  to  an  affine  ambiguity. 

2.3  Surface  primitives 

For  a  representation  to  be  effective  for  recognition,  it  must  be  an  abstraction  of 
the  surface  description  produced  by  throwing  away  irrelevant  detail.  A  pointwise 


map  of  principal  curvatures  is  not  much  more  useful  than  a  pointwise  map  of 
normals  or  depths. 

A  major  program  of  research  in  trying  to  understand  surfaces  at  a  higher  level 
of  abstraction  was  initiated  by  Koenderink  and  Van  Doom,  paralleling  work  in 
mathematics  on  singularity  theory.  Essentially  they  tried  to  understand  visual 
events:  how  does  the  topology  of  the  outline  change  for  an  observer  moving 
around  the  object?  This  leads  naturally  to  the  notion  of  aspect  graphs.  This 
line  of  attack  has  been  pursued  to  its  logical  conclusion-  the  development  of 
algorithms  and  understanding  the  computational  complexity  is  now  complete 
for  both  polyhedra[14]  and  curved  objects[27].  The  computational  complexity 
has  proved  to  be  very  high,  making  the  practical  use  of  exact  aspect  graphs 
of  general  curved  surfaces  unlikely.  An  even  more  serious  criticism  is  that  the 
abstraction  is  the  wrong  kind  of  abstraction.  The  formalism  doesn’t  take  into 
account  the  scale  of  the  different  topological  changes  of  the  outline.  For  a  human 
observer,  and  a  computer  vision  system  that  has  to  start  from  a  brightness  image, 
the  scale  of  the  feature  has  a  major  impact  on  its  detectability  and  localizability. 
While  the  usefulness  of  multiple  view  representations  remains  debatable,  it 
seems  clear  that  a  concern  for  what  one  can  hope  to  reliably  extract  from  early 
vision  needs  to  inform  the  choice  of  representation  formalism.  From  a  single 
image,  outline,  texture  and  shading  are  the  main  cues  available.  Texture  cues 
can  be  treated  locally  [21]  so  when  we  are  lucky  enough  to  have  visually  resolvable 
texture,  we  can  locally  get  surface  normals  and  curvatures.  It  is  the  shading  cue 
that  is  more  difficult  to  analyze;  global  interactions  due  to  interreflections  [8] 
make  Horn  style  shapeTrom-shading  theory  unusable  in  general  contexts.  As 
of  present  writing,  surface  primitives  derived  from  image  data  have  not  been 
successfully  applied  in  recognition,  and,  as  it  is  unlikely  that  this  position  will 
change  in  the  short  term,  we  do  not  treat  surface  primitives  further. 

2.4  Markings 

There  have  been  few  attempts  to  exploit  surface  markings  explicitly  in  recog¬ 
nition;  most  such  attempts  completely  exclude  geometry  [25,  33].  Some  of  the 
difficulties  usually  cited  include  variations  in  colour  caused  by  illuminant  effects 
or  interreflections,  difficulty  of  segmenting  marked  objects,  and  the  effects  of 
foreshortening  on  markings.  However,  it  is  easily  shown  that,  if  the  geometry  of 
an  object  can  be  recovered  up  to  a  projective  ambiguity  from  a  single  view,  then 
the  markings  on  that  object  can  be  covariantly  recovered  by  mapping  image 
grey-levels  back  on  to  the  geometry  recovered"^.  This  observation  has  been  used 
to  produce  representations  that  incorporate  markings  (in  [11]),  but  no  strategies 
for  exploiting  the  markings  were  proposed. 

Recognition  of  2D  textures  is  a  well-studied  problem  with  numerous  success¬ 
ful  applications  in  remote  sensing,  inspection  etc.  Usually  it  is  approached  with 

^  This  follows  because  the  process  of  mapping  texture  back  involves  intersecting  lines 
through  the  focal  point  with  the  surface’s  geometry;  since  forming  intersections  is 
covariant  (preserved)  under  projective  ambigiuties,  backprojecting  image  texture 
must  be  covariant  under  projective  ambiguities. 


statistical  classification  techniques  (see  e.g.  [12]  for  a  review)  given  a  suitable  set 
of  texture  features.  Various  features  have  been  used  in  the  literature,  including 
some  derived  from  co-occurrence  matrices,  Fourier  domain  features,  and  those 
based  on  convolving  the  image  with  multiscale,  multi-orientation  linear  filters. 


3  Fundamental  notions 

Indexing  is  a  concept  so  basic  to  recent  work  on  representation  that  it  had  to 
be  expounded  (above,  section  2.1)  before  the  work  was  discussed.  Clearly,  one 
cannot  build  a  large  fast  recognition  system  without  some  form  of  indexing,  and 
this  gives  some  clues  to  appropriate  areas  for  future  research.  However,  indexing 
alone  will  not  answer  all  the  difficulties  of  utilising  a  large  model  base  with  a 
wide  variety  of  objects.  Some  form  of  structuring  or  organisation  is  required  at 
every  level  of  the  recognition  process.  This  organisation  is  centred  on  the  notion 
of  object  class. 


3.1  Class 

The  collection  of  known  volumetric  cases  (SOR,  algebraic  ...)  above  induces  a 
natural,  utilitarian,  geometric  notion  of  class  [43]:  At  early  stages,  constraints 
derived  from  the  particular  class  are  used  to  guide  segmentation  and  grouping; 
At  later  stages  viewpoint  invariant  or  viewpoint  stable  descriptors  for  that  class 
are  computed  from  these  groups;  Index  functions  are  computed  from  these  de¬ 
scriptions  and  paired  with  appropriate  model  class  sub-libraries,  and  so  on.  In 
this  way  an  appropriately  structured  representational  system  can  be  used  to 
control  complexity  at  all  levels  in  the  recognition  process. 

Using  model  classes  in  this  way  can  be  thought  of  as  a  modern  version  of 
Shape  from  X,  where  X  is  now  a  model  class.  In  an  update  of  Binford  s  use  of 
models  [2,  3],  we  use  model  class  to  facilitate  three  distinct  tasks: 

1.  Grouping 

Recent  work  by  Zerroug  and  Nevatia  shows  that  one  can  recover  volumetric 
SHGC  descriptions  from  fragments  of  the  outline  in  the  presence  of  occlusion, 
clutter  and  extraneous  edges  [42].  For  a  surface  of  revolution  this  grouping 
can  be  accomplished  without  requiring  a  cross-sectional  curve  in  the  im¬ 
age  [43].  The  grouped  outline  also  enables  simpler  computation  of  the  next 
two  levels. 

2.  Invariant  Representation 

Using  the  outline  alone,  3D  surfaces  can  often  only  be  recovered  up  to  a 
parameterised  family  of  surfaces  that  could  have  projected  to  the  outline.  In 
contrast,  an  invariant  representation  can  be  obtained  which  is  sufficient  for 
recognition.  For  example,  the  axis  cross-ratios  obtained  from  corresponding 
outline  bitangent  for  SORs  are  invariant  to  projective  transformations,  and 
can  be  used  as  index  functions,  but  the  surface  is  not  determined. 


3.  Recovery  of  3D  shape 

With  additional  information,  e.g.  camera  calibration,  both  pose  and  3D 
shape  can  be  determined.  For  example,  for  a  SOR  if  a  cross-sectional  cir¬ 
cle  is  visible  and  the  camera  is  calibrated,  pose  is  determined  up  to  a  one 
parameter  family  (distance  from  the  camera),  and  the  shape  of  the  surface 
recovered  from  the  outline  up  to  an  overall  scaling. 

This  notion  of  class  is  organised  around  measurable  image  cues.  There  is  no  point 
in  drawing  class  distinctions  that  cannot  be  measured  in  images;  furthermore, 
an  ideal  notion  of  class  involves  a  notion  of  emerging  object  identity^  where 
the  behaviour  of  later  recognition  modules  is  conditioned  by  class  hypotheses 
established  earlier.  For  example,  given  that  an  image  line  is  likely  to  have  come 
from  a  polyhedron  rather  than  from  a  surface  of  revolution,  it  is  more  productive 
to  organise  the  grouping  strategy  around  constructing  object  faces. 


3.2  Consistency 

Typically,  in  systems  that  use  indexing,  recognition  hypotheses  are  not  mono¬ 
lithic;  several  image  groups  may  index  to  object  substructures.  It  is  then  essential 
to  determine  which  of  these  hypotheses  can  be  fused  into  an  hypothesis  about 
object  identity,  requiring  a  notion  of  consistency.  As  a  second  example,  hypothe¬ 
ses  about  object  identity  are  equivalent  to  hypotheses  about  such  matters  as 
camera  internal  parameters  and  illuminant  colour,  and  some  pairs  of  hypotheses 
about  object  identity  may  be  mutually  incompatible.  By  enforcing  compatibility 
between  these  hypotheses,  we  may  extract  information  about  the  camera  [10]  or 
the  illuminant. 

Consistency  appears  to  offer  mechanisms  by  which  the  geometry  of  unknown 
volumetric  primitives  can  be  constrained,  using  reconstructions  of  the  known 
objects  surrounding  them.  Without  a  modelbase,  image  data  can  constrain  an 
object’s  Euclidean  geometry  to  at  best  a  four  parameter  ambiguity ^  This  am¬ 
biguity  can  be  constrained  further  by  the  use  of,  say,  such  cues  as  occlusion, 
support,  or  the  approximate  size  of  typical  objects. 


3.3  Richer  descriptors 

It  is  a  commonplace  that  richer  descriptors  should  make  recognition  easier.  The 
basis  for  this  argument  is  simple;  the  number  of  cells  in  an  indexing  table  goes 
up  exponentially  with  the  dimension,  meaning  that,  in  principle,  cramped  tables 
can  be  made  spacious  by  making  another  measurement.  However,  success  in 
constructing  such  descriptors  has  been  generally  poor.  Cues  such  as  colour  and 
markings,  which  are  widely  recognised  as  having  great  potential,  have  not  yet 

^  This  is  the  family  of  projective  transformations  that  fixes  the  focal  point  and  the 
cone  of  rays  joining  the  object  to  the  focal  point;  as  a  result,  if  this  transformation 
is  applied  to  the  world,  the  image  is  unaffected,  which  means  it  is  a  fundamental 
ambiguity. 


been  used  effectively.  The  present  generation  of  volumetric  representations  can 
be  made  richer  in  three  ways:  by  increasing  the  geometric  detail  recovered,  by 
recovering  more  distinctive  geometric  measurements,  and  by  associating  texture 
and  marking  cues  with  the  geometry  recovered. 

Increasing  geometric  detail  is  not  a  promising  strategy;  as  the  history  of 
3D  from  3D  matching  shows,  a  significant  component  of  representation  involves 
identifying  and  disposing  of  irrelevant  information.  For  volumetric  primitives, 
understanding  how  to  recover  more  distinctive  measurements  and  understanding 
which  measurements  are  more  stable  and  effective  in  indexing  represents  an 
important,  largely  open,  problem.  Referring  markings  to  a  surface  coordinate 
frame  (as  in  figure  1),  and  measuring  descriptors  of  object  patterns  in  that  frame, 
will  provide  substantially  increased  richness  in  representation. 

4  What  should  be  done 

Effective  notions  of  class  and  of  indexing  are  essential  for  building  large  modeb 
based  vision  systems;  class,  because  it  can  be  used  to  organise  both  the  mod- 
elbase  and  the  segmentation  process,  and  indexing,  to  restrict  search  of  the 
modelbeise.  These  areas  represent  those  in  which  we  believe  the  most  concrete 
progress  is  possible.  Progress  in  indexing  is  likely  to  be  most  concrete,  it  re~ 
mains  important  to  expand  and  strengthen  notions  of  the  volumetric  primitives 
for  which  indexing  functions  can  be  constructed;  to  incorporate  surface  markings 
into  the  construction  of  indexing  functions;  and  to  generate  a  reasonable  theory 
of  how  to  choose  amongst  possible  indexing  options. 

The  extent  to  which  surface  markings  have  been  neglected  to  date  is  surpris¬ 
ing,  as  they  present  the  best  way  to  expand  the  size  of  modelbases  possible  with 
currently  understood  volumetric  primitives.  We  see  two  approaches  to  managing 
the  effects  of  foreshortening  in  images  of  markings:  constructing  pose-invariant 
features  that  describe  markings;  or,  using  the  inferred  geometry  to  refer  the 
image  detail  back  to  a  surface  coordinate  frame.  The  second  approach  is  more 
general.  The  neglect  of  markings  is  particularly  puzzling  because  the  problems 
presented  by  markings  appear  to  be  concrete  and  accessible  with  present  knowl- 
edge. 

To  conclude:  We  have  argued  in  this  paper  that  representation  is  not  sim¬ 
ply  a  matter  of  whether  a  particular  primitive  (super-quadric,  SHGC,  etc.)  is 
used.  Instead,  representation  in  visual  recognition  is  the  entire  process  from  the 
early  stages  of  segmentation  and  grouping,  through  the  extraction  of  viewpoint- 
invariant  descriptions,  hypothesis  combination,  to  finally  the  recovery  of  3D 
shape.  The  implementation  of  this  process  that  we  suggest,  based  on  current 
achievements,  is  a  geotnetric  notion  of  class  using  volumetric  primitives. 
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Abstract 

Characteristic  sets  as  defined  by  Ritt  in  his  book  Differential  Algebra  are  resur¬ 
rected.  It  is  shown  that  Ritt’s  definition  of  a  characteristic  set  is  quite  different  from 
that  used  by  ^Wu.  A  characterization  theorem  for  Ritt  s  characteristic  sets  for  poly¬ 
nomial  ideals  is  proved.  An  algorithm  for  computing  a  characteristic  set  is  given.  It 
is  shown  how  characteristic  sets  can  be  used  to  study  the  structure  of  ideals  as  well 
as  their  associated  varieties.  Many  properties  of  ideals  and  associated  varieties  can  be 
computed  directly  from  their  characteristic  sets. 


1  Introduction 

Ritt,  in  his  book.  Differential  Algebra  (1950),  introduced  the  concept  of  a  characteristic 
set  of  a  set  of  differential  algebraic  polynomials  from  a  differential  polynomial  ring,  and 
demonstrated  its  utility  in  studying  properties  of  differential  polynomial  ideals  as  well  as 
differential  varieties. 

In  1978,  Wu  Wen-Tsun  [28,  29,  30]  revived  interest  in  characteristic  sets  by  showing 
how  they  can  be  used  for  mechanizing  geometry  theorem  proving  and  later,  for  studying 
zeros  of  polynomial  equations.  Wu  made  many  significant  contributions  in  making  Ritt  s 
constructions  practical  as  well  as  extended  Ritt's  concepts  to  make  them  suitable  for  solving 
polynomial  equations.  It  turns  out,  however,  that  Wu’s  definition  of  a  characteristic  set  is 
quite  different  (in  a  fundamental  sense)  from  Ritt’s  original  definition. 

In  this  paper,  we  critically  examine  Ritt’s  definitions  and  contrast  them  with  Wu’s 
definitions.  We  propose  a  framework  for  studying  Ritt’s  characteristic  sets.  We  give  a 
necessary  and  sufficient  condition  for  a  set  of  polynomials  to  be  a  characteristic  set.  ^Ve 
also  give  an  algorithm  for  computing  a  characteristic  set  from  a  basis  of  a  polynomial  ideal. 
We  show  how  Ritt’s  characteristic  set  can  be  used  for  geometry  theorem  proving,  equation 
solving,  computing  dimension  as  well  as  for  decomposing  a  variety. 

'Supported  in  part  by  a  grant  from  United  States  Air  Force  Office  of  Scientific  Research  AFOSR-91- 
0361.  The  results  reported  in  this  paper  were  presented  in  an  invited  talk  at  an  international  workshop 
Mechanization  of  Mathematics,  Chinese  Academy  of  Science,  Beijing,  China,  July  1992. 
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Ritt’s  Definition  of  Characteristic  Set 


Ritt,  in  his  book,  Differential  Algebra  (1950)  (p.  5),  defined  the  concept  of  a  charac¬ 
teristic  set  of  a  set  of  polynomials  from  a  polynomial  ring  k[yiy.,,^yi]  as  a  chain  of  the 
lowest  rank  in  the  set  (p.  5).  A  chain  is  a  finite  system  of  polynomials  in  triangular  form 
(with  each  polynomial  in  the  set  introducing  at  least  one  new  new  variable).  See  the  next 
subsection  below  for  precise  definitions. 

After  introducing  the  definition,  Ritt  remarked  on  p.  5  that  a  necessary  and  sufficient 
condition  for  a  chain  C  in  a  set  S  of  polynomials  to  be  a  characteristic  set  of  S  is  that 
there  is  no  non-zero  polynomial  p  in  S  that  is  reduced  with  respect  to  C,  or  equivalently, 
every  polynomial  p  in  S  can  be  reduced  using  C.  Let  us  call  this  property  as  Ritt’s  first 
characterization  theorem  of  characteristic  sets. 

Theorem  1  (Ritt)  A  chain  C  in  a  set  S  of  polynomials  is  a  characteristic  set  o/S  if  and 
only  if  every  non-zero  polynomial  in  S  can  be  reduced  by  C . 

Ritt  also  gave  a  method  for  computing  such  a  characteristic  set  from  a  finite  S  (p.  5).  It 
is  easy  to  see  that  even  for  infinite  S,  a  characteristic  set  exists.  Given  a  total  ordering  on 
variables,  C  is  a  finite  maximal  subset  of  minimal  reduced  polynomials  in  triangular  form 
S.  In  other  words,  pick  a  minimal  polynomial  Ci  in  S  and  include  it  in  C.  Delete  from  S 
all  polynomials  which  can  be  reduced  by  Ci^,  and  from  the  remaining  subset  of  S,  called 
Si,  pick  a  minimal  polynomial  C2*  The  class  of  C2  is  necessarily  higher  than  the  class  of 
Cl.  Delete  from  Ei  aU  polynomials  which  can  be  reduced  using  Ci  and  C2,  and  from  the 
remaining  subset  S2,  pick  a  minimal  polynomial  C3,  and  so  on.  This  process  terminates 
since  there  are  only  finitely  many  indeterminates  and  a  chain  cannot  have  more  than  n 
elements. 

In  Chapter  4  of  his  book  (pp.  88-90),  Ritt  gave  a  characterization  for  characteristic  sets 
of  prime  polynomial  ideals. 

Theorem  2  (Ritt)  A  necessary  and  sufficient  condition  for  a  chain  C  =  Pi  •  *  -  Pn  to  be  a 
characteristic  set  of  a  prime  ideal  overk[ui^  •  •  • ,  •  *  *  ?  ^n]f  cohere  {wi,  •  •  •,  Wm?  *  *  *  5  ^n} 

{2/1?  *  ■  ■  ?  2//}?  w  +  n  =  /,  {ui,  •  •  -Um}  are  transcendentals,  is  that  for  each  0  <  i  <  pi^i  is 
irreducible  (cannot  be  factored)  over  the  extension  field  obtained  by  adjoining  the  zero  of  pi 
to  the  extension  field  The  polynomial  pi  should  be  irreducible  over  the  field  of  rational 
functions  k{ui^*  •  l^he  extension  field  K\  is  obtained  by  adjoining  the  zero  of  pi 

to  k{ui,--,Um)^ 


It  is  also  the  case  that  a  polynomial  p  is  in  a  prime  ideal  I  if  and  only  if  it  pseudo-divides 
to  0  using  a  characteristic  set  of  /. 

In  the  same  chapter  (pp.  95-96),  Ritt  also  gave  an  algorithm  for  constructing  from  a 
finite  S,  characteristic  sets  of  a  finite  number  of  prime  ideals  whose  manifolds  (irreducible 
varieties)  make  up  the  manifold  (variety)  of  S.  This  was  done  by  developing  the  above 
characterization  theorem  for  characteristic  sets  associated  with  prime  ideals. 

^We  cannot  require  that  only  those  polynomials  which  pseudo-divide  to  0  by  ci  be  deleted,  as  there  can 
be  polynomials  in  S  which  neither  pseudo-divide  to  0  using  ci ,  nor  are  reduced  with  respect  to  ci . 
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2.1  Background:  Definitions 

Below  we  first  reproduce  the  following  definitions  from  Ritt’s  book. 

Let  yi,  •  •  • ,  1/n  be  the  set  of  indeterminates  with  the  total  ordering  yi  <  •••  <yi.  Consider 
the  polynomial  ring  R  —  k[yi,  ■  •  ■  ,yi].  By  the  class  of  a  polynomial  p  in  R,  we  mean 
maximum  i  such  that  yi  appears  in  p.  If  p  is  in  fc,  then  the  class  of  p  is  said  to  be  0.  For  a 
polynomial  p  with  class  i  >  0,  yi  is  called  its  leading  variable  or  indeterminate.  The  initial 
of  p  is  the  coefficient  of  the  highest  term  in  yi  when  p  is  viewed  as  a  univariate  polynomial 
in  yi. 

A  polynomial  p  is  said  to  be  of  higher  rank  than  q  with  respect  to  yi  if  the  degree  of  yi 
in  p  is  greater  than  the  degree  of  yi  in  q.  A  polynomial  p  is  said  to  be  of  higher  rank  than 
q^  denoted  as  p  y,  if  p  is  of  higher  class  than  q  or  p  and  q  are  of  the  same  class  f  ^  0^  and 
p  is  of  higher  rank  than  q  with  respect  to  yi.  Obviously,  if  p  is  of  higher  rank  than  q,  then 
q  is  of  lower  rank  than  p.  If  neither  p  is  of  higher  rank  than  q  nor  q  is  of  higher  rank  than 
p,  then  p  and  q  are  said  to  be  of  the  same  rank.  ^ 

If  p  is  of  class  i  >  0,  then  q  is  said  to  be  reduced  with  respect  to  p  ii  q  is  of  lower  rank 

than  pin  yi. 

A  sequence  of  polynomials  pi ,  •  •  • ,  Pfc  is  called  a  chain  (or  equivalently  in  triangular  form) 
if  either 

(i)  k  =  1,  and  pi  7^  0,  or 

(ii)  A:  >  1,  the  class  of  pi  is  >  0,  and  for  j  >  i,  pj  is  of  higher  rank  than  pi  and  reduced 
with  respect  to  pi. 

Of  course,  k  <  n.  It  is  easy  to  see  that  every  polynomial  in  a  chain  is  introducing  at  least 
one  new  variable.  That  is,  C  =  pi,  •  •  •  ,Pa:  is  a  chain  if  the  leading  variable  of  Ci  is  less  than 
the  leading  variable  of  Cj,  for  aU  1  <  i  <  j  <  A;. 

A  chain  Ci  =  pi,  •  •  •  ,Pfc  is  said  to  be  of  higher  rank  than  another  chain  C2  =  qi,  -  •  •  ,qi, 

written  ns  Ci  >  C2,  ii  either 

(i)  there  is  n  i  <  k,  I  such  that  for  all  j  <  i,  pi  and  qi  are  of  the  same  rank,  and  pj  is  of 
higher  rank  than  qj,  or 

(ii)  I  >  k  and  pi  and  qi  are  of  the  same  rank  for  i  <  A;. 

Two  chains  such  that  neither  is  of  higher  rank  than  the  other,  are  said  to  be  of  the  same 
rank. 

Given  a  set  of  chains,  Ritt  showed  how  to  find  a  chain  which  is  not  of  higher  rank  than 
any  other  chain  in  the  set. 

And,  a  characteristic  set  of  S,  as  stated  above,  is  defined  to  be  a  chain  in  S  of  a  lowest 
rank. 

A  polynomial  q  is  reduced  with  respect  to  a  chain  C  =  pi ,  •  •  • ,  pt  if  9  is  reduced  with 
respect  to  p,  for  each  1  ^  f  ^  A:.  This  implies  that  for  each  1  f  ^  A;,  the  degree  of  yj  in  q 
is  lower  than  than  the  degree  of  yj  in  p,-,  where  the  class  of  p,-  is  j. 

A  polynomial  q  reduces  to  q'  using  a  chain  C  =  pi,  •  •  •  ,pfc  if  there  exist  ei,  •  •  • , e*  and 
ai ,  •  •  • ,  ttfc  such  that 

4®'=  . .  •  q  =  oipi  +  •  •  •  +  akPk  +  q'- 

It  is  good  to  take  minimal  ei,  ••  •,€*;.  The  polynomial  q'  is  typically  computed  by  pseudo- 
dividing  q  by  pfc  using  minimal  Cfc,  then  pseudo-dividing  the  remainder  by  pk-i  and  finally 

^In  case  p  and  q  are  of  the  same  class  i  >  0  and  the  degrees  of  yi  in  p  and  g  are  the  same,  it  is  possible 
to  further  refine  this  relation  by  considering  lower  variables. 
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pseudo- dividing  the  result  by  pi  using  minimal  6i.  It  is  easy  to  see  that  is  reduced  with 
respect  to  C.  It  is  also  easy  to  see  that  pj  >  for  aU  j  >  i^l  <  i  <  k.  Also,  q'  is  in  the 
ideal  generated  by  the  basis  consisting  of  C  and  q.  We  also  call  g'  as  the  normal  form  of  q 
with  respect  to  (7. 

2.2  Characteristic  Set  a  la  Wu 

Since  1984,  Wu  has  popularized  Ritt’s  concept  of  a  characteristic  set  and  Ritt’s  algorithm 
given  on  pp.  95-96  of  his  book,  for  constructing  a  characteristic  set  by  showing  its  appli¬ 
cability  to  equation  solving  in  general  and  geometry  theorem  proving,  in  particular.  Wu 
made  important  contributions  in  computing  characteristic  sets  efficiently  as  well  as  associ¬ 
ating  a  characteristic  set  with  quasi- varieties.  Wu’s  papers  have  focussed  on  constructing 
characteristic  sets  from  a  finite  set  of  polynomials  with  the  crucial  distinction  being  that 
the  ideal  generated  by  such  a  finite  set  need  not  be  a  prime  ideal. 

According  to  Wu’s  definition  of  a  characteristic  set,  a  characteristic  set  of  E  need  not 
be  in  S.  Thus  Wu’s  definition  of  a  characteristic  set  appears  to  be  somewhat  different  from 
Ritt’s  definition  of  a  characteristic  set.  Instead  a  characteristic  set  of  E  as  defined  by  Wu 
is  a  chain  of  the  lowest  rank  from  a  larger  set  containing  E  where  the  additional  elements 
are  obtained  from  E  by  pseudo-division.  It  turns  out  that  aU  these  additional  elements  are 
in  the  ideal  generated  by  E.  But,  a  characteristic  set  of  E  using  Wu’s  algorithm  is  not 
necessarily  a  characteristic  set  of  the  ideal  generated  by  E  either  since  it  does  not  always 
include  minimal  elements  in  the  ideal. 

Recently  Wu  (private  communication,  1992)  has  suggested  that  a  distinction  ought  to  be 
made  between  two  different  notions  of  a  characteristic  set:  one  associated  with  a  polynomial 
set  and  another  associated  with  an  ideal  generated  by  a  polynomial  set.  This  distinction 
is  somewhat  unclear  to  us.  Given  a  set  of  polynomials  E,  it  is  not  clear  what  it  means  to 
associate  a  characteristic  set  with  E.  It  is  clearly  not  a  chain  of  lowest  rank  in  E  as  then 
not  every  polynomial  in  E  pseudo-divides  to  0  using  this  chain.  It  is  a  chain  of  the  lowest 
rank  from  a  set  containing  E  (and  perhaps,  the  set  is  closed  under  some  other  operations)? 
An  interesting  question  comes  up:  What  set  is  it?  Is  there  an  algebraic  and/or  geometric 
characterization  of  such  a  set? 

3  Properties  of  Characteristic  Sets 

Henceforth,  we  associate  a  characteristic  set  with  a  polynomial  ideal  over  k[y\^  •  •  *  j  2//]-  Given 
a  basis  H,  let  I  =  (5),  the  ideal  generated  by  B,  The  set  {2/1,  •  ‘ ?  2/^}  be  classified 
into  two  subsets:  the  set  of  transcendentals,  {tzi,  •  •  * ,  and  the  remaining  subset  of 
indeterminates  {a^i,  •  •  such  that  m  -f  n  =  /;  obviously,  no  polynomial  in  Ui 

alone  is  in  /,  and  for  every  Xi^  there  is  a  polynomial  in  xi  in  /.  We  will  also  assume  the 
ordering  ui  <  •  •  *  <  Um  <  xi  <  *  •  •  <  Xn^  As  defined  by  Ritt,  a  characteristic  set  of  /  is  a 
chain  of  the  lowest  rank  in  /.  Let  C  =  pi  •  •  be  a  characteristic  set  of  /.  It  is  easy  to  see 
that  no  non-zero  polynomial  in  I  is  reduced  with  respect  to  (7. 

Theorem  3  Every  polynomial  in  I  pseudo-divides  to  0  using  a  characteristic  set  C  of  I. 

Proof.  By  contradiction.  Let  5  be  in  /  such  that  q  does  not  pseudo-divide  to  0  using  C . 
Without  any  loss  of  generality  we  can  assume  that  q  is  minimal.  Let  g'  be  the  remainder 
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of  q  when  reduced  by  C;  obviously  q'  f  0.  By  the  definition  of  pseudo-division,  9'  belongs 
to  /  since  q  and  C  are  from  /.  Let  the  class  of  q'  be  j.  Let  p,-  be  the  polynomial  in  C 
whose  class  is  j;  if  there  is  no  such  polynomial  in  C ,  then  C  =  pi,'  •  ■  ,Pi,  g',p,+i,  •  •  •,Pn  is 
a  chain  of  lower  rank  than  C  in  J,  where  the  class  of  p,  is  lower  than  j  and  the  class  of  pi+i 
is  higher  than  j,  which  is  a  contradiction  to  the  assumption  C  is  a  characteristic  set  of  J. 
If  such  a  Pi  exists,  then  C"  =  pi ,  •  •  • ,  Pi-i ,  g', Pi+i ,  •  •  • ,  Pn  is  a  chain  of  lower  rank  than  C  in 
I,  which  is  a  contradiction. 

□ 

The  converse  is,  however,  not  true.  For  prime  ideals,  it  is  the  case  that  if  a  polynomial 
pseudo-divides  to  0  using  a  characteristic  set  of  a  prime  ideal,  then  the  polynomial  is  in  the 
prime  ideal.  For  prime  ideals,  characteristic  sets  can  be  used  for  ideal  membership  test. 

Below,  we  give  examples  of  characteristic  sets  associated  with  ideals.  It  is  assumed  that 
X  <y. 

Example  1:  Consider  the  ideal  h  generated  by  J?i  =  {(a;  -  l)^,(a:  -  l)y  +  1}  over 
Q[x,y].  The  reader  can  verify  that  Bi  is  a  characteristic  set  of  B\  using  Wu’s  definition. 
However,  Bi  is  not  a  characteristic  set  of  Ii  since  Ii  is  the  trivial  ideal,  and  any  non-zero 
element  of  Q  is  a  characteristic  set  of  /j. 

Example  2:  The  ideal  I2  generated  by  B2  =  {(a;  —  l)^,(x  —  l)p  -h  (a;  —  1)}  has  B2  as 
its  characteristic  set  {B2  is  also  a  characteristic  set  in  Wu’s  sense). 

Example  3:  Consider  the  ideal  J3  generated  by  B3  =  {(a:^— 4),  (p^— 9),  (x-t-2/-l-5)z^-H}. 
B3  is  a  characteristic  set  of  B^  using  Wu’s  definition.  But  B3  is  not  a  characteristic  set  of 
I3.  Instead,  {(a;^  -  4),  (x  -  2)(p  -  3),  (x  -  2){Qz'^  +  1)}  is  a  characteristic  set  of  I3. 

A  characteristic  set  as  defined  by  Bitt  should  also  serve  as  a  characteristic  set  in  Wu’s 
sense.  The  requirement  on  a  characteristic  set  as  defined  by  Wu  is  that  (i)  it  is  a  chain, 
(ii)  every  polynomial  in  the  input  set  pseudo- divides  to  0  using  it,  and  (iii)  the  zeros  (the 
variety)  of  the  input  set  is  the  union  of  the  the  quasi-variety  of  the  characteristic  set  (i.e., 
the  zeros  of  the  polynomials  in  the  characteristic  set  on  which  the  product  of  the  initials 
does  not  vanish)  and  the  variety  of  the  input  set  augmented  with  the  product  of  the  initials. 

We  now  develop  a  characterization  theorem  for  characteristic  sets  of  polynomial  ideals 
analogous  to  the  theorem  that  Ritt  proved  about  characteristic  sets  for  prime  ideals.  For 
that  we  need  the  following  operations  with  respect  to  a  chain.  We  assume  that  transcen¬ 
dental  elements  ui,-  ■  ■,Um  are  known. 

3.1  Invertibility  with  respect  to  an  ideal 

Let  I  be  an  ideal  generated  by  a  basis  (pi,  •  ■•,Pi).  A  polynomial  q  is  said  to  be  invertible 
with  respect  to  an  ideal  I  if  there  exist  polynomials  ai,  •  •  •  ,a,-  and  q~^  such  that 

q-^q  =  aipi  aiPi  4-  r, 

where  r  is  a  non-zero  polynomial  in  the  transcendentals.  The  polynomial  q~^  is  called  an 
inverse  of  q  with  respect  to  I  in  the  quotient  ring  •  •  • ,  Um,  Xi,  •  •  • ,  x„]//. 

It  is  easy  to  see  that  q  being  invertible  with  respect  to  I  means  that  q  and  I  do  not  have 
any  common  zero  in  any  algebraic  closed  extension  of  the  field  fc(Mi,  •  •  • ,  Um)  (follows  from 
Hilbert’s  NuUstellensatz).  Hence,  a  polynomial  q  is  not  invertible  with  respect  to  I  if  there 
do  not  exist  r,  Oi,  •  •  • ,  a,-  satisfying  the  above  properties.  This  also  implies  that  q  and  I 
have  a  common  zero  (again  follows  from  Hilbert’s  NuUstellensatz). 


5 


If  a  basis  of  /  is  a  chain  (7,  then  the  class  of  pi  above  is  lower  than  or  equal  to  the  class 
of  g,  and  the  inverse  of  q  is  of  lower  rank  than  pi.  Then,  we  wiU  also  say  that  q  is 
invertible  (or  not  invertible)  with  respect  to  C. 

If  a  polynonaial  q  is  not  invertible  with  respect  to  C,  then  it  is  possible  to  find  a  poly¬ 
nomial  d  such  that 


dq  =  aipi  +  . . .  +  aiPi, 


where  the  class  of  pi  is  lower  than  or  equal  to  the  class  of  g,  and  d  is  of  lower  rank  than  pi. 
The  polynomial  d  is  a  zero  divisor  in  the  quotient  ring  k[xi^  •  •  • ,  Xn]/{C),  and  will  be  called 
an  annihilator  of  q  with  respect  to  C, 

A  polynomial  r  divides  another  polynomial  s  with  respect  to  a  /  if  and  only  if  there 
exist  q  and  a^s  such  that 


qr  =  aos  -h  aipi  +  . . .  +  aipi, 

where  ao  is  a  polynomial  in  the  transcendentals,  if  any. 

If  a  chain  C  is  used  as  a  basis  for  I  and  r  divides  another  polynomial  s  with  respect 
to  C  then  the  class  of  pi  is  not  higher  than  the  class  of  5,  and  ao  is  a  polynomial  in  the 
transcendentals,  if  any. 

The  invertibility  check,  the  computation  of  a^s  and  r,  as  weU  as  of  d  can  be  done  by 
Collins’  extended  gcd  (resultant)  algorithm  [7,  25].  This  computation  can  also  be  performed 
using  pseudo- division,  as  weU  as  using  D5  method  [9]. 

A  chain  C  is  said  to  be  invertible  if  the  initial  of  every  polynomial  in  C  is  invertible  with 
respect  to  C.  Such  a  chain  has  been  called  a  regular  chain  by  Kalkebrener  [16]. 

A  chain  C  is  said  to  be  in  canonical  form  if  every  polynomial  p  in  C,  (i)  p  is  reduced  with 
respect  to  polynomials  lower  than  p  in  (7,  and  (ii)  (a)  the  initial  of  p  is  either  a  polynomial  in 
the  transcendentals  or  (b)  the  zero  set  of  the  initial  is  a  subset  of  the  zero  set  of  polynomials 
lower  than  p  in  (7.  The  condition  (ii)  (b)  is  to  ensure  that  the  initial  that  is  not  invertible, 
does  not  have  a  zero  that  is  not  a  zero  of  the  polynomials  lower  than  in  it  in  the  chain. 

Example  4;  A  chain  {(a;  —  1)^,  (a:  —  l)(a:  —  2)y-\-  {x  —  1)}  is  not  in  canonical  form,  since 
the  initial  of  the  second  polynomial  has  a  zero  a:  =  2  which  is  not  a  zero  of  (a;  -  1)^.  An 
equivalent  chain  is,  however,  {(x  —  1)^,  (x  —  l)y  —  x(x  —  1)}  is  in  canonical  form. 

3.2  A  Characterization  Theorem  for  Characteristic  Sets  of  Polynomial 
Ideals 

Even  if  a  characteristic  set  C  of  an  ideal  I  is  an  invertible  chain,  it  is  not  the  case  that  if  a 
polynomial  p  pseudo- divides  to  0  using  C,  then  p  is  in  /.  Consider,  for  example,  the  ideal 
I  generated  by  (7  =  ux,  where  u  is  an  transcendental  element.  C  is  an  invertible  chain. 
The  polynomial  x  pseudo-divides  to  0  but  x  is  not  in  the  ideal  /.  However,  this  is  true  for 
zero- dimensional  ideals. 

3.3  Zero-dimensional  Ideals 

In  this  subsection,  we  prove  a  characterization  theorem  for  zero- dimensional  ideals.  These 
results  are  then  generalized  to  positive-dimensional  ideals.  Throughout  this  subsection, 
m  =  0  and  there  are  no  transcendental  elements. 
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Theorem  4  Given  an  ideal  I,  if  its  characteristic  set  C  is  an  invertible  chain,  then  a 
polynomial  q  is  in  I  if  and  only  if  q  pseudo-divide  to  0  with  respect  to  C. 

Proof.  Without  any  loss  of  generality,  we  can  assume  that  the  initial  initial{pi)  of  pi  in  C 
is  an  element  of  k  (if  initial{pi)  is  not  an  element  of  k,  then  pi  can  be  multiplied  with  the 

inverse  of  initialijpi)  to  obtain  p\  whose  initial  is  in  A;). 

The  only-if  part  follows  from  the  second  characterization  theorem  about  characteristic 
sets,  i.e.  every  polynomial  in  I  pseudo- divides  to  0  with  respect  to  a  characteristic  set  C  of 
I. 

If  q  pseudo- divides  to  0  using  C ,  we  can  write 

•  •  •  ik^^q  =  CLipi  +  •  •  •  +  dkPkf 

where  ij  is  the  initial  of  pj.  Since  ij  €  k,  we  have  q  G  (C)  and  hence  I. 

□ 

A  characteristic  set  of  an  ideal  may  have  polynomials  whose  initials  are  not  inveitible 
with  respect  to  the  characteristic  set. 

Example  5:  Let  /  =  ((x  - 1)^,  {x  -  l)y  +  {x-  1)).  Under  the  ordering  y>x,s.  Grobner 
basis  using  lexicographic  ordering  on  terms  [2,3,  14, 18],  GB  =  {(a:-l)^,(a;-l)2/  +  (a;-l)}, 
from  which  it  foUows  that  C  =  GB.  The  initial  of  the  polynomial  introducing  y  is  z- 1,  and 
a;  - 1  is  not  invertible  with  respect  to  C.  However,  using  the  ordering  a:  >  j/,  a  characteristic 

set  is  {y-\- l)x  -  (y  + 1).  .  ^  .  j  r 

There  are  ideals  for  which  no  matter  what  ordering  on  indeterminates  is  used,  none  ol 

its  characteristic  sets  has  polynomials  with  invertible  initials. 

Example  6:  Consider  /  =  ((a:  -  l)(x  -  2),  (x  -  1)(2/  -  1),  {y  -  1)(2/  +  !))•  Using  the 
ordering  x  >  y,  =  {y^  -  h{y  -  l)a:  -  (y  -  1)}  is  a  characteristic  set  of  I.  Using  the 
ordering  y  >  x,  Ca  =  {x^  -  3x  -[-  2,  (x  -  l)y  -  (x  -  1)}  is  a  characteristic  set  of  1. 

Theorem  5  If  a  polynomial  pi  is  in  a  characteristic  set  C  and  initial{pi)  is  not  invertible 
with  respect  to  C,  then  initial{pi)  must  divide  pi  modulo  the  chain  {pi,  •  •  >  1- 

Proof  Sketch.  Let  pi  =  initial{pi)xi'^'  -1-  p' ,  and  Ip  =  initial{pi).  Since  Ip  is  not 
invertible  with  respect  to  C,  there  exist  Jp  and  «!,•••,  Ufc,  k  <  i  such  that 


Jpip  =  o-\Pi  H - +  ^kPk- 

So,  JpPi  =  JpilpXi'^'  +p')  =  JpIpXi^'  +  JpP'-  Clearly  Jpp'  is  in  the  ideal  (C).  Further  Jp 
does  not  involve  x,-  thus  implying  that  Jpp'  is  of  lower  rank  than  pi.  The  coefficient  of  every 
lower  degree  (>  0)  terms  in  Xj  in  Jpp'  must  pseudo-divide  to  0  using  C  as  otherwise  a  chmn 
in  (C)  in  which  Jpp'  replaces  p  exists  and  is  smaller  than  C  contradicting  the  assumptmn 
that  C  is  a  characteristic  set  of  (C).  The  coefficient  of  Xj®  in  Jpj/  must  also  pseudo-divide 
to  0  using  C,  as  otherwise  there  exists  a  chain  smaller  than  C  in  (C)  in  which  a  polynomial 
lower  than  pi  in  C  is  replaced  by  a  smaller  polynomial. 

Consider  any  such  coefficient,  say  c,  of  a  term  in  x;  in  Jpp' .  Since  it  pseudo-divides  to 

0  using  elements  lower  than  p  in  C,  we  have: 


•  •■ik^'‘c  =  aipi  + - h  akPk, 
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Without  any  loss  of  generality,  we  can  assume  that  ij  is  either  an  element  of  or  a  polyno¬ 
mial  that  is  not  invertible  with  respect  to  C, 

If  every  polynomial  lower  than  pi  in  C  has  an  invertible  initial,  then  each  ij  above  is  in 
k,  and  hence  c  is  in  the  ideal  (pi,  •  •  •  Since  c  ==  Jpc\  where  c'  is  the  term  in  pi  without 
Xiy  Ip  divides  c'. 

In  the  general  case,  we  can  assume  that  for  every  polynomial  smaller  than  p  in  C,  its 
initial  divides  the  polynomial.  There  are  two  possibilities  for  Ip  to  be  not  invertible  with 
respect  to  Ip  vanishes  on  a  zero  of  some  initial  of  a  polynomial  lower  than  p  or  Ip  vanishes 
on  a  zero  of  polynomials  lower  than  p  which  is  not  a  zero  of  any  of  the  initials,  or  both. 

Any  zero  of  an  initial  used  on  the  left  side  is  a  zero  of  the  right  hand  side.  Since  an 
initial  that  is  is  not  invertible  with  respect  to  C,  it  is  a  zero  of  C,  implying  a  zero  of  p, 
which  means  that  the  irreducible  factors  (primitive  part)  of  the  initial  divide  p  any  zero  of 
c  is  also  a  zero  of  the  right  hand  side. 

□ 


3.4  Positive-Dimensional  Ideals 

Now  we  are  able  to  state  a  characterization  theorem  for  characteristic  sets  of  polynomial 
ideals. 

Lemma  1  Given  an  ideal  I  and  a  characteristic  set  C  of  I  such  that  C  is  an  invertible 
chainj  if  p  pseudo-divides  to  0  then,  p  is  in  the  ideal  generated  C'  =  pi,  •  •  '^Pnf  '^here 
p'  =  pi/content(pi)^  the  gcd  of  the  coefficients  of  Xi  terms  in  pi,  and  for  i  >  1,  = 

Pi/content{pi)f  where  the  content  of  pi  is  defined  to  be  the  gcd  of  the  coefficients  of  Xi  terms 
with  respect  to  the  chain  pi,  •  •  ‘^Pi^i- 

Theorem  6  A  chain  C  is  a  characteristic  set  of  a  polynomial  ideal  if  and  only  if  for  every 
polynomial  p  in  C ,  either  its  initial  Ip  is  invertible  with  respect  to  C ,  or  Ip  divides  p  with 
respect  to  C. 

Proof.  The  if  part  follows  from  Theorem  5  above. 

We  prove  the  only  if  part  by  proving  that  C  is  a  characteristic  set  of  /  =  (C).  Without 
any  loss  of  generality  we  can  assume  that  the  initial  of  every  polynomial  in  C  is  either 
a  polynomial  in  transcendentals  or  not  invertible  with  respect  to  C.  In  either  case,  the 
initial  of  a  polynomial  pi  in  C  divides  pi  with  respect  to  polynomials  lower  than  pi  in 
C.  From  C,  we  can  generate  an  invertible  characteristic  set  C'  =  Pi?*‘*jPfc  follows: 
Pj  —  'Pi/i'piy  ^pi  —  inttzal(^pfj y  1  ^  ^  A?. 

The  proof  is  by  contradiction.  Assume  that  C  is  not  a  characteristic  set  of  /,  then 
there  must  exist  a  non-zero  polynomial,  say  in  I  such  that  q  does  not  pseudo-divide  to 
0  using  C  as  that  would  suggest  that  a  chain  smaller  than  C  exists  in  J.  Without  any 
loss  of  generality,  we  can  assume  that  q  is  reduced  with  respect  to  C  (this  is  so  because 
pseudo-division  of  a  polynomial  in  an  ideal  by  another  polynomial  in  the  ideal  produces  a 
polynomial  in  the  ideal  also). 

Since  q  is  in  (C),  there  exist  a^’s  such  that 


q  =  aipi  +  ttA^PA;. 
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so  the  above  relation  becomes 


q  =  aiip^Pi  +  •  •  •  +  o-kipkP'k- 

It  is  possible  to  pseudo- divide  djipj  by  C  to  obtain  a  remainder  bj  such  that 

ttjipj  -  H - -f  aj^kp'k  +  bj, 

where  bj  is  reduced  with  respect  to  C  and  ajj  <  for  all  1  <  i  Substituting  for 

each  djip-,  we  get 


q  =  dip[  H - 1-  dkp'k, 

where  dj  <  pj  for  aU  1  <  i  <  /  <  fc.  Let  the  class  of  qhe  j  <  0;  consider  the  coefficient  of 
the  highest  term  in  Xj  in  q  viewed  as  a  univariate  polynomial  in  Xj.  The  coefficients  of  all 
terms  involving  a  variable  >  Xj  on  the  right  hand  side  must  be  0.  Prom  this,  it  follows  that 
q  is  reduced  with  respect  to  C,  implying  that  q  is  not  reduced  with  respect  to  C ,  which  is 
a  contradiction. 

So,  C  is  a  characteristic  set  of  (C). 

□ 

In  fact,  C  serves  as  a  characteristic  set  of  many  ideals  containing  /  as  a  subideal. 

Every  characteristic  set  has  a  non-empty  zero  set. 

For  an  invertible  characteristic  set  C,  its  zero  set  is  clearly  the  zero  set  of  (C)®. 


4  An  Algorithm  for  Computing  a  Characteristic  Set  of  an 
Ideal 

Since  a  characteristic  set  of  an  ideal  consists  of  certain  minimal  elements  in  the  ideal,  it 
can  be  easily  extracted  from  a  Grobner  basis  of  the  ideal.  This  is  precisely  what  Kandri- 
Rody  showed  in  his  Ph.D.  dissertation  [17].  Kandri-Rody  gave  an  algorithm  for  computing  a 
characteristic  set  of  a  polynomial  ideal  using  Buchberger’s  Grobner  basis  algorithm.  From  a 
basis  B  of  an  ideal  I  and  a  total  ordering  on  indeterminates,  compute  its  reduced  Grobner 
basis  using  lexicographic  ordering.  From  the  Grobner  basis,  extract  a  minimal  chain  of 
polynomials  and  call  it  C .  Kandri-Rody  showed  that  G  is  a  characteristic  set  by  showing 
that  every  polynomial  in  I  pseudo- divides  to  0  using  C . 

This  algorithm  is  however  inefficient  in  practice,  since  computing  lexicographic  Grobner 
basis  is  inefficient  in  practice  (except  for  the  zero- dimensional  case  where  the  basis  conver¬ 
sion  algorithm  of  Gianni  et  al  can  be  used  to  obtain  a  lexicographic  Grobner  basis  from  a 
total  degree  Grobner  basis). 

Below  we  give  a  direct  algorithm  for  computing  a  characteristic  set  of  a  polynomial 
ideal  specified  by  its  basis  B.  The  algorithm  does  not  need  to  compute  a  Grobner  basis  of 
a  polynomial  ideal.  It  is  a  modification  of  Ritt’s  algorithm  given  on  p.  95  of  his  book. 

Let  Ti  =  B  and  G  =  0. 

Repeat  the  following  steps  until  every  polynomial  in  E  pseudo-divides  to  0  using  G . 

1.  Find  a  minimal  chain  G  in  S.  Let  5o  =  S;G  =  0,i  =  0.  This  is  done  by  repeating 
the  following  two  steps. 
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(a)  Remove  the  smallest  polynomial  in  Si^  say  p.  If  p  is  a  polynomial  in  the  transcen- 
dentals  and  the  lowest  indeterminate  then  factor  and  check  which  factor  is  in  the  ideal 
L 

If  not,  check  whether  the  initial  init  of  p  is  invertible  or  not  with  respect  to  C.  If  init 
is  invertible,  then  extend  C  to  include  p  {p  could  be  multiplied  by  the  inverse  of  init  and 
pseudo-divided  by  C  to  get  a  remainder  which  is  included  in  C  instead  of  p).  If  not,  then 
check  whether  init  is  a  factor  of  p  with  respect  to  C.  If  yes,  then  extend  C  to  include  p. 

If  init  is  not  invertible  and  is  also  not  a  factor  of  p,  then  find  a  zero  divisor  zd  of  init 
with  respect  to  C,  and  multiply  p  with  it,  pseudo-divide  by  C, 

It  can  be  easily  shown  that  the  initial  of  the  result  is  invertible. 

Augment  C  to  include  the  result.  Let  Ci  be  the  new  polynomial  added  to  (7.  Replace  p 
in  S  by  Ci  also. 

It  is  possible  that  in  this  step,  we  may  get  a  polynomial  in  a  lower  variable  which  may 
result  in  removing  an  element  from  C  that  introduces  the  lower  variable, 

(b)  Remove  from  Si  all  polynomials  which  are  not  reduced  with  respect  to  the  new 
polynomial  c^.  Let  Si^i  stand  for  the  remaing  subset  of  polynomials.  Also  increment  i. 

After  this  step,  the  initial  of  every  polynomial  in  the  chain  is  either  invertible  or  a  factor 
of  the  polynomial. 

2.  Once  a  minimal  chain  C  is  identified.  Pseudo-divide  every  polynomial  in  E  with 
respect  to  (7.  (This  can  be  done  using  the  division  algorithm  or  CoUins’  subresultant 
algorithm  [1,  6,  7,  25].)  If  aU  remainders  are  0,  then  C  is  a  characteristic  set  of  (B). 
Otherwise,  augment  S  to  include  the  non-zero  remainders. 

3.  Repeat  steps  1  and  2. 

Optimizations  suggested  by  Wu  and  his  colleagues  (such  as  using  subresultant  compu¬ 
tation  to  do  multi-step  pseudo-division  as  weU  as  skipping  some  remainders)  can  be  easily 
incorporated  into  the  above  algorithm. 

Theorem  7  The  above  algorithm  gives  a  characteristic  set  C  of  I  =  (B), 

Proof.  If  the  algorithm  stops  after  j  iterations,  then  C  is  clearly  a  minimal  chain  in  S. 

In  each  iteration,  E  is  modified  by  the  following  two  operations.  One,  replace  a  poly¬ 
nomial  p  in  E  by  another  lower  polynomial  Ci  obtained  by  multiplying  p  by  a  zero  divisor 
with  respect  to  a  chain  C  at  hand  which  annuls  the  initial  of  p.  Second,  augment  E  with 
non-zero  remainders  with  respect  to  (7.  Each  of  these  operations  results  in  a  polynomial  in 
(E),  and  hence  (5). 

We  need  to  prove  that  every  polynomial  in  I  pseudo-divides  to  0  using  C.  This  can  be 
shown  by  proving  that  C  constitutes  a  minimal  chain  and  using  Theorem  1.  it  is  easy  to  see 
that  except  for  the  least  polynomial  in  (7,  no  polynomial  in  variables  of  class  higher  than 
m  -h  1  in  /  in  is  of  lower  rank  than  the  polynomial  of  the  same  class  in  C.  For  the  least 
polynomial  in  (7,  we  make  an  explicit  check  to  ensure  that  it  is  a  polynomial  of  smallest 
degree  in  the  transcendentals  and  Xi  in  /.  Hence  C  is  a  characteristic  set. 

□ 

There  exist  an  ideal  I  such  that  no  matter  what  ordering  is  used,  the  above  algorithm 
gives  a  characteristic  set  C  of  /  in  which  initials  of  polynomials  are  not  invertible.  However, 
there  exists  a  characteristic  set  of  I  consisting  of  polynomials  with  invertible  initials. 

Consider  I  =  {{x-  l){y  -  1),  (x  -  2){y  -  2)).  Using  y  >  x,  the  above  algorithm  gives  the 
characteristic  set  Ci  =  {(a;  -  2)y  -  2{x  -  2),  (x  -  2){x  -  1)}.  Using  x  >  y,  the  characteristic 
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set  computed  by  the  algorithm  is  C2  =  {(j/  ~  1)2^  ~  iv  ~  1))  {v  ~  1)(2/  2)}.  The  polynomial 

a;  +  y  _  3  belongs  to  /  thus  giving  other  characteristic  sets  in  which  a:  +  y  -  3  is  the  smallest 
polynomial  no  matter  what  ordering  is  used. 

It  is  an  interesting  question  to  determine  additional  computational  steps  to  obtain  an 
invertible  characteristic  set  (if  one  exists)  for  an  ideal  with  respect  to  an  ordering. 

One  obvious  way  is  to  compute  an  invertible  characteristic  set,  in  case  it  exists,  from  a 
characteristic  set  using  linear  algebra  techniques.  The  general  structure  of  the  polynomials 
in  an  invertible  characteristic  set  is  known  (i.e.  the  degree  of  each  variable  in  the  polyno¬ 
mial).  For  the  above  example,  it  is  known  that  the  degree  of  the  polynomial  in  y  is  1,  and 
the  degree  of  the  polynomial  in  a;  is  <  2.  So,  the  polynomial  must  of  the  form  y  +  aix  +  a2. 
This  polynomial  must  also  pseudo-divide  to  0  using  Ci,  so  2(a;  -  2)  -f  (a;  -  2)(aia;  -f- 02)  must 
pseudo-divide  to  0  using  {x  -  2)(x  -  1),  which  implies  that  Ui  =  1  and  02  =  -3. 

In  general,  we  start  with  the  smallest  polynomial  in  the  chain  whose  initial  is  not 
invertible.  Let  its  class  be  i.  Using  the  degree  bounds  on  the  variables,  we  assume  a 
general  polynomial  satisfying  the  degree  bounds  with  unknown  coefficients  of  the  terms. 
We  set  up  linear  equations  from  the  fact  that  the  polynomial  must  pseudo- divide  to  0  using 
the  known  characteristic  set,  and  solve  for  the  coefficients.  If  there  is  a  solution  for  the 
coefficients,  the  desired  polynomial  is  determined;  otherwise,  there  is  no  polynomial  of  class 
i  with  an  invertible  initial.  In  either  case,  we  go  to  the  next  polynomial  in  the  chain  with 
an  initial  that  is  not  invertible,  and  so  on. 

5  Decomposing  a  Variety  into  Simpler  Varieties 

The  variety  of  a  characteristic  set  C  of  an  ideal  I  includes,  in  general,  the  variety  of  the 
ideal  I.  This  is  so  because  in  general,  (C)  is  a  subideal  of  I. 

If  C  is  invertible,  then  7®  =  (C)  when  viewed  over  k{ui,  •  •  • ,  Um)-  Then  V{I)  =  V(C). 

If  C  is  not  invertible,  then  V{I)  C  V{C). 

A  characteristic  set  C  of  an  ideal  I  can  be  used  to  decompose  V (/)  into  subvarieties 
such  that  each  subvariety  has  a  invertible  characteristic  set  associated  with  it. 

Theorem  8  (Decomposition): 

Given  S,  a  system  of  invertible  characteristic  sets,  can  be  effectively  con¬ 

structed  (without  using  factorization  over  algebraic  extension  fields)  such  that 

Zeros{T,)  =  Zeros(Ci)  U  •  •  •  U  Zeros{Ck)- 

The  decomposition  theorem  has  many  applications.  For  examples,  it  can  serve  as  a 
basis  for  geometry  theorem  proving,  determining  zeros  of  polynomial  equations,  primary 
decomposition,  determining  zeros  of  parametric  polynomial  equations,  computing  dimension 
of  a  variety  (ideal),  etc. 

Note  that  this  decomposition  theorem  is  of  different  character  than  a  related  theorem 
given  by  Wu  in  which  decomposition  of  the  zero  set  of  a  set  of  polynomials  (variety)  is 
given  in  terms  of  a  finite  union  of  quasi- varieties.  In  Wu’s  decomposition  theorem,  there  is 
no  guarantee  that  the  quasi- variety  is  empty  or  not;  extra  computation  is  needed  to  ensure 
non-emptiness  of  quasi- varieties.  In  the  above  decomposition,  every  subvariety  is  non-empty 
since  it  is  associated  with  an  invertible  characteristic  set. 
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The  following  algorithm  gives  the  above  decomposition.  The  correctness  proof  of  the 
algorithm  serves  as  a  proof  of  the  decomposition  theorem. 

1.  Compute  a  characteristic  set  C®  of  7  =  (S).  Let  F  =  {C®} 

2.  Check  whether  every  element  in  F  is  an  invertible  characteristic  set.  If  yes,  terminate. 
If  not,  pick  a  element  of  F  that  is  not  invertible,  say  C*  in  F  which  includes  a  polynomial 

whose  initial  is  not  invertible. 

3.  Find  the  smallest  polynomial,  say  p‘,  in  C®  whose  initial  is  not  invertible.  Given 
that  initial{p‘)  divides  p*  with  respect  to  C\  a  decomposition  of  C*  can  be  obtained  using 
D5  algorithm  (subresultant  computation)  which  determines  that  initial[p')  is  not  invertible 
with  respect  to  C*.  Let  the  decomposition  give  a  family  of  chains  C{,-  •  •,Cj,  each  one  of 
them  having  exactly  one  of  initial{p‘)  and  p^ (initial{p‘).  Include  in  F,  the  characteristic 
sets  of  S  U  (7*,  for  each  1  <  g  <  j- 

4.  Repeat  steps  2  and  3. 

The  algorithm  stops  when  every  element  of  F  is  an  invertible  characteristic  set.  The 
family  F  serves  as  the  decomposition.  The  variety  V{I)  is  decomposed  to  the  union  of 
subvarieties  F(C*),C®  6  F. 

The  correctness  of  the  algorithm  is  based  on  the  fact  that  given  a  chain  C  =  pi  •  •  •Pi+i 
in  an  ideal  I  such  that 

(i)  initial{pj^i),0  <  j  <  i  is  invertible  with  respect  to  pi,  •  •  -jPj, 

(ii)  initial(pi+i)  is  not  invertible  with  respect  to  pi,  •  •  •,pj  and 

(iii)  initial{pi^i)  divides  p,+i  with  respect  to  pi,  •  •  •,pi, 

then  there  exist  ideals  Ij  =  7  U  (p{,  •  •  •  )Pi+i)?  1  <  i  <  P  such  that 

(i)  pl^j  is  either  initial{pi^i)  or  pi^ilinitial{pi^i), 

(ii)  for  1  <  A;  <  i,  p^  is  a  factor  of  pk  with  respect  to  pi  •  •  •  pk-i ,  and  there  exist  ,  •  •  • ,  jg 

such  that  Pk  =  *  •  •  •  *  p{^  with  respect  to  pi,  •  •  •  ,pfc-i,  and 

(iii)  p^  does  not  vanish  on  the  zeros  of  pj,  •  •  -J^k-i- 

The  termination  of  the  algorithm  follows  from  the  property  that  a  characteristic  set  of 
each  of  Ij  is  of  lower  rank  than  the  characteristic  set  of  7. 

Examples:  Consider  the  set  S  =  {(a;  —  l)(a;  —  2),  (x  — l)(p  — 1),  (p  — 1)(2/+ !)}•  Assume 
X  >  y.  A  characteristic  set  Ci  =  {y^  -  1,  (p  -  l)a;  -  (p  -  1)}  can  be  constructed  using 
the  algorithm  given  in  the  previous  section.  Since  the  initial  of  the  second  polynomial, 
p  —  1,  is  not  invertible.  We  decompose  Ci  to  consider  new  sets  E  U  {p  + 1,  x  —  1}  and  the  set 
SUCiUlp—l},  from  which  we  can  compute  characteristic  sets  Gj  =  {p+l,x— 1},  and  = 
{p  —  1,  x^  —  3x  4-  2},  giving  us  a  decomposition  of  the  variety  of  the  above  set. 

Depending  upon  a  variable  ordering,  we  may  get  different  decompositions  with  different 
number  of  components. 

Steps  1  and  2  and  3  can  be  merged  into  the  algorithm  for  computing  a  characteristic  set 
discussed  in  the  previous  section.  It  is  not  essential  to  compute  characteristic  sets  in  step 
3  (or  even  in  step  1).  At  any  point  in  the  computation,  if  a  polynomial  is  generated  such 
that 

(i)  it  is  a  candidate  for  inclusion  in  a  characteristic  set, 

(ii)  its  initial  is  not  invertible  with  respect  to  lower  polynomials,  and 

(iii)  its  initial  divides  the  polynomial, 

decomposition  can  be  performed  in  the  process  of  determining  noninvertibility  of  the  initial. 
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6  Computing  dimension  of  a  polynomial  ideal 

The  dimension  of  a  polynomial  ideal  I  (equivalently,  variety)  can  be  obtained  from  a  de¬ 
composition  of  the  variety  associated  with  /  as  a  family  of  invertible  characteristic  sets,  as 
discussed  above  in  the  previous  section. 

Theorem  9  The  dimension  of  the  unmixed  ideal  I  (and  the  associated  variety)  generated 
by  an  invertible  characteristic  set  C  =  pi,- ■  •  ,Pk  is  the  difference  of  the  number  of  variables 
appearing  in  C  and  the  size  of  the  characteristic  set. 

Proof.  Let  m-t-1  be  the  number  of  variables  appearing  in  pi.  If  m  >  0,  then  no  polynomial 
in  any  of  the  m  variables  appearing  in  pi  is  in  /  (since  such  a  polynomial  cannot  be  pseudo- 
divided  to  0  by  C).  Let  m  +  *2  be  the  number  of  variables  appearing  in  P2',  clearly  no 
polynomial  in 

And,  there  are  polynomials  in  m  -|-  1  or  more  variables  in  I.  So  by  the  results  in  [22], 
the  dimension  of  I  is  m. 

□ 

If  a  characteristic  set  C  is  not  invertible,  nothing  much  can  be  said  about  the  dimension 
of  an  ideal  that  has  C  as  a  characteristic  set.  However  the  following  is  true: 

Theorem  10  Given  a  characteristic  set  C,  the  dimension  of  the  ideal  (C)  is  the  number 
of  variables  in  C  minus  the  size  of  C  plus  maximum  number  of  polynomials  in  C  whose 
initials  simultaneously  vanish. 

It  is  easy  to  see  that  the  above  theorem  about  the  dimension  of  an  invertible  characteris¬ 
tic  set  C  is  an  immediately  corollary  of  this  theorem  since  none  of  the  initials  of  polynomials 
in  C  can  vanish. 

Proof.  Let  {initiali^,  ■  •  •  ,initiali^}  be  a  maximal  subset  of  initials  of  polynomials 
in  C  that  simultaneously  vanish.  It  is  easy  to  see  that  the  variety  associated  with  (C) 
includes  the  subvariety  (C^initiali^,- ■  ■  ,initialiff.  Now  the  dimension  of  this  variety  is 
the  number  of  variables  plus  j  minus  the  size  of  C  (a  characteristic  set  corresponding  to 
(C,  initiali^,-  •  • ,  initiali-)  can  be  proved  to  be  invertible  and  its  size  is  the  size  of  C  minus 
j).  It  is  easy  to  see  that  the  variety  of  I  does  not  include  a  subvariety  of  higher  dimension. 
□ 

The  following  property  is  helpful  to  determine  the  dimension  of  an  ideal. 

Lemma  2  Given  a  chain  C  =  pi, . . .  ,pi,pi.^.■^  such  that  the  initial  of  pj,  j  <  i  -|-  1,  is 
invertible  with  respect  to  C  and  initial(pi.^i)  of  Pi+i  divides  Pi+i  with  respect  to  C ,  the 
variety  V{{pi, . .  .,pi,pi+i))  is  either  of  the  same  dimension  as  the  dimension  of 
F((pi, . . .,  Pi,  pi.^1 1  initial  (pi+i))  or  one  dimension  higher. 

Proof. 

VHpi,.  ..,Pi,Pi+i))  =  V({pi, . . .  ,pi,pi+i/initial{pi+i)))  U  V{{pi, . .  .,Pi,  initial  (pi+i))). 

The  dimension  of  V{{pi,...,pi,pi+i/initial{pi+i)))  is  n  -  i-|- 1,  where  n  is  the  num¬ 
ber  of  variables  in  C.  The  dimension  of  V{fpi,. .  .,pi,initial{pijf.\)))  is  either  0  or  the 
dimension  of  V{{p\,...,pi))  depending  upon  whether  initial{pi+i)  is  invertible  with  re¬ 
spect  to  pi, . .  .,pi.  And,  the  dimension  of  F((pi, . . .  ,Pi))  is  one  more  than  the  dimension 
of  V{{pi, . . .  ,pi,pi+i/initial(pi+i))). 
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□ 

Putting  all  these  together,  we  get: 

Theorem  11  Given  an  ideal /e^  Ci,  •  •  • , Cj  be  the  invertible  characteristic  sets  associated 
with  unmixed  ideals  such  that  V{I)  =  V{Ci)  U  ***V{Cj),  Then^  the  dimension  of  I  is 
maximum  over  the  dimension  of  each  (Ci). 

Proof.  Follows  from  the  definition  of  the  dimension  of  an  ideal  and  the  above  theorems. 

Since  many  ideals  can  have  a  chain  C  satisfying  the  properties  stated  in  section  as 
a  characteristic  set,  it  is  not  possible  to  compute  the  dimension  of  a  polynomial  ideal  I 
from  its  characteristic  set  C  especially  if  C  is  not  invertible.  Let  n  be  the  total  number 
of  variables;  let  k  be  the  size  of  the  characteristic  set  C  in  which  ki  are  the  polynomials 
whose  initials  are  invertible.  If  C  is  an  invertible  chain,  i.e.,  ki  =  fc,  then  the  decomposition 
will  have  only  one  component,  so  the  dimension  of  I  is  n  —  k.  In  the  case  when  the  initial 
of  some  polynomial  in  C  is  not  invertible,  i.e.  ki  <  fc,  we  get  upper  and  lower  bounds  on 
the  dimension  of  I  from  the  structure  of  C.  The  dimension  cannot  be  lower  than  n  -  k^ 
and  cannot  be  more  than  n  ki.  This  is  so  because  it  is  possible  to  construct  examples  of 
ideals  and  characteristic  sets  for  which  the  dimensions  are  any  number  between  the  lower 
and  upper  bounds  obtained  from  their  characteristic  sets,  including  the  lower  and  upper 
bounds. 

Examples:  Consider  Si  =  {(a;  —  l)(a;  —  2),(x  —  l)(y  —  l),y^  —  l},/i  =  (^i)-  Assume 
y  >  X.  Its  characteristic  set  is:  Ci  =  {x"^  —  3x  +  2,(x  —  !)(?/  —  1)}.  The  initial  a;  -  1  of  the 
second  polynomial  is  not  invertible.  So  the  upper  and  lower  bounds  on  the  dimension  of  Ii 
are  1  and  0,  respectively.  We  decompose  the  variety  of  Ii  and  get  two  sets  Si  U  {a:  —  2,  y  —  1} 
and  Si  U  {x  —  1},  from  which  we  get  two  invertible  characteristic  sets  C\  —  {x  —  2,y  —  1} 
and  Cl  =  {x  —  l,y^-l}.  The  variety  is  decomposed  into  these  two  varieties  defined  hy  Cl 
and  Cl-  The  dimension  of  the  subvarieties  of  /i  is  0  in  each  case,  thus  implying  that  the 
dimension  of  /i  =  0. 

If  the  characteristic  set  was  computed  using  the  ordering  x  >  y,  we  get:  C2  =  {y^  — 
1,  (y— l)(x  — 1)}.  The  initial  y—1  is  not  invertible,  we  decompose  and  get  Cl  =  {y+l,x  — 1}, 
and  a  new  set  {(x  -  l)(x  -  2),  (x  -  l)(y- 1),  y^  - 1,  y  - 1}  from  which  another  characteristic 
set  C2  =  {y  -  1,  (x  -  l)(x  -  2)}  is  generated.  The  dimension  of  each  of  the  subvarieties  in 
this  case  is  also  0. 

Consider  I2  =  (x^  -  l,(x  -  l)y,(x  +  l)z,yz).  Assume  z  >  y  >  x.  A  characteristic  set 

C2  =  {x^  -  1,  (x  -  l)y,  (x  +  1)2}.  The  initials  of  second  and  third  polynomials  are  not 

invertible.  The  upper  and  lower  bounds  on  the  dimension  of  I2  are  2  and  0,  respectively. 
We  decompose  the  variety  of  I2  to  get  Cl  =  {x  +  l,y},  Cl  =  {x  -  l,z},  which  gives  the 
dimension  of  I2  to  be  1. 

Consider  Iz  =  (x^  —  l,(x  -  l)y,  (x  -  1)2).  Assume  z  >  y  >  x.  A  characteristic  set 

C3  =  {x^  -  l,(x  -  l)y,  (x  -  1)2}.  The  initials  of  second  and  third  polynomials  are  not 

invertible.  The  upper  and  lower  bounds  on  the  dimension  of  Iz  are  2  and  0,  respectively. 
The  variety  of  Iz  is  decomposed  to  get  C3  =  {x  +  l,y,2},  and  C3  =  {x  —  1},  which  gives 
the  dimension  of  Iz  to  be  2. 

Consider /4  =  (x2-l,(x-l)y,(x-l)2,22-(a:-l)).  Assume  2  >  y  >  x.  A  characteristic 
set  C4  =  {x^  -  1,  (x  -  l)y,  (x  +  1)2}.  The  initials  of  second  and  third  polynomials  are  not 
invertible.  The  upper  and  lower  bounds  on  the  dimension  of  1^  are  2  and  0,  respectively. 
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We  decompose  the  variety  of  I4  to  get  Cl  =  {x  +  1,  y,  +  2},  and  Cl  —  {x  —  1,  z},  which 
gives  the  dimension  to  be  1. 

Acknowledgements:  I  am  grateful  to  Professor  Kandri-Rody  for  raising  pertinent 
questions  about  Ritt’s  concept  of  characteristic  sets  which  made  me  rethink  about  charac¬ 
teristic  sets.  Many  ideas  reported  in  this  paper  arose  during  conversations  with  Kandri-Rody 
while  his  visit  to  SUNY,  Albany.  I  thank  Lakshman  Y.N.  for  many  helpful  discussions. 

References 

[1]  J.S.  Brown  and  J.F.  Traub  (1971),  “On  Euclid’s  algorithm  and  the  theory  of  subresul¬ 
tants”,  JACM,  18(4),  505-514. 

[2]  B.  Buchberger  (1965),  Ein  algorithmus  zum  auffinden  der  basiselemente  des  restk- 
lassenringes  nach  einem  nulldimensionalen  polynomideal,  Ph.D.  Thesis  (in  German), 
Universitat  Innsbruck. 

[3]  B.  Buchberger  (1985),  “Grobner  bases:  An  Algorithmic  method  in  Polynomial  Ideal 
theory”,  in  Multidimensional  Systems  Theory,  N.K.  Bose,  ed.,  D.  Reidel  Publishing 
Co.,  184-232. 

[4]  S.-C.  Chou  (1988),  Mechanical  Geometry  Theorem  Proving,  D.  Reidel  Publishing  Com¬ 
pany,  Netherlands. 

[5]  S.-C.  Chou  and  X.-S.  Gao  (1990),  “Ritt-Wu’s  decomposition  algorithm  and  geometry 
theorem  proving”,  Proc.  of  CADE-10,  Kaiserslautern,  Germany,  Springer  Verlag  LNCS. 

[6]  G.E.  Collins  (1967),  “Subresultants  and  polynomial  remainder  sequences”,  JACM, 
14(1),  128-142. 

[7]  G.E.  CoUins  (1971),  “The  calculation  of  multivariate  polynomial  resultants”,  JACM, 
18(4),  515-532. 

[8]  C.I.  Connolly,  D.  Kapur,  J.L.  Mundy,  and  R.  Weiss  (1989),  “GeoMeter:  A  system  for 
modeling  and  algebraic  manipulation”,  Proc.  DARPA  workshop  on  Image  Understand¬ 
ing,  CA,  797-804. 

[9]  D.  Duval  (1991),  “Computation  with  algebraic  numbers:  the  D5  method”,  J.  Symbolic 
Computation,  to  appear. 

[10]  J.C.  Faugere,  P.  Gianni,  D.  hazard,  and  T.  Mora  (1989),  Efficient  computation  of 
zero-dimensional  Grobner  bases  by  change  of  ordering,  Technical  Report  89—52,  LITP, 
Uni ver site  Paris. 

[11]  G.  GaUo  and  B.  Mishra,  “Recent  Progress  in  Characteristic  Set  Computation:  Com¬ 
plexity  and  Open  Problems,”  Proc.  of  the  1992  Inti.  Workshop  on  Mathematics  Mech¬ 
anization,  Wu  and  Cheng  (eds.),  Beijing,  China,  July  16-18,  1992,  Inti.  Academic 
Publishers,  28-37. 

[12]  P.  Gianni  (1987),  “Properties  of  Grobner  bases  under  specializations”,  Proc.  EURO- 
CAL  ’87,  Leipzig,  Springer  Verlag  LNCS  378,  293-297. 


15 


[13]  W.  Grobner  (1949),  Moderne  algebraische  Geometrie,  Wien. 

[14]  C.M.  Hoffman  (1989),  Geometric  and  Solid  Modeling;  An  Introduction,  Morgan  Kauf- 
mann. 

[15]  M.  Kalkbrener  (1987),  “Solving  systems  of  algebraic  equations  by  using  Grobner  bases”, 
Proc.  EUROCAL  ’87,  Leipzig,  Springer  Verlag  LNCS  378,  282-292. 

[16]  M.  Kalkbrener  (1992),  A  generalized  Euclidean  algorithm  for  solving  systems  of  alge¬ 
braic  equations.  Technical  Report,  Mathematical  Sciences  Institute,  Cornell  University, 
Ithaca,  NY. 

[17]  A.  Kandri-Rody  (1984),  Effective  methods  in  the  theory  of  polynomial  ideals,  Ph.D. 
Thesis,  Dept,  of  Mathematics,  Rensselaer  Polytechnic  Institute,  Troy,  NY. 

[18]  D.  Kapur  and  Lakshman  Y.N.  (1992),  “Elimination  Methods:  An  Introduction,”  in 
Symbolic  and  Numerical  Computation  for  Artificial  Intelligence,  Donald,  Kapur  and 
Mundy  (eds.).  Academic  Press,  1992. 

[19]  D.  Kapur  and  J.L.  Mundy  (1988),  “Wu’s  method  and  its  application  to  perspective 
viewing”.  Artificial  Intelligence,  37,  15-36. 

[20]  D.  Kapur  and  H.  Wan  (1990),  “Refutational  proofs  of  geometry  theorems  via  charac¬ 
teristic  set  computation”,  Proc.  of  the  ACM-SIGSAM  1990  International  Symposium 
on  Symbolic  and  Algebraic  Computation  -  ISSAC  ’90,  Japan. 

[21]  D.E.  Knuth  (1980),  Seminumerical  algorithms:  The  art  of  computer  programming,  2, 
Second  Edition,  Addison  Wesley,  407-408. 

[22]  H.  Kredel  and  V.  Weispfenning  (1989),  “Computing  dimension  and  independent  sets  for 
polynomial  ideals,”  Computational  Aspects  of  Commutative  Algebra,  (ed.  L.  Robbiano), 
Academic  Press,  London,  97-113. 

[23]  D.  Lazard  (1992),  “Solving  zero-dimensional  algebraic  systems,”  J.  Symbolic  Compu¬ 
tation,  13,  117-131. 

[24]  D.  Lazard  (1991),  “A  new  method  for  solving  algebraic  systems  of  positive  dimension,” 
Discrete  Applied  Math.,  33,  147-160. 

[25]  R.  Loos  (1983),  “Generalized  polynomial  remainder  sequences”,  in  Symbolic  and  Al¬ 
gebraic  Computation  (Computing  Supplement  4)>  B.  Buchberger,  G.E.  Collins,  and  R. 
Loos,  eds..  Springer  Verlag,  2nd  ed.,  115-137. 

[26]  J.F.  Ritt  (1950),  Differential  Algebra,  AMS  Colloquium  Publications. 

[27]  B.L.  van  der  Waerden  (1970),  Algebra,  1  and  2,  Frederick  Ungar  Pub.  Co. 

[28]  W.  Wu  (1984),  “On  the  decision  Problem  and  the  mechanization  of  theorem  proving 
in  elementary  geometry”,  Scientia  Sinica,  21,  (1978),  150-172.  Also  in  Bledsoe  and 
Loveland,  eds..  Theorem  Proving:  After  25  years.  Contemporary  Mathematics,  29, 
213-234. 


16 


[29]  W.  Wu  (1986),  “Basic  principles  of  mechanical  theorem  proving  in  geometries”,  J.  of 
System  Sciences  and  Mathematical  Sciences,  4(3),  (1984),  207-235.  Also  appeared  in 
J.  of  Automated  Reasoning,  2,  (1986),  221-252. 

[30]  W.  Wu  (1986),  “On  zeros  of  algebraic  equations  -  an  application  of  Ritt’s  principle”, 
Kexue  Tongbao,  31(1),  1-5. 

[31]  W.  Wu  (1992),  “On  Char-set  Method  and  Linear  Equations  Method  of  Polynomial 
Equation  Solving,”  Proc.  of  the  1992  Inti  Workshop  on  Mathematics  Mechanization, 
Wu  and  Cheng  (eds.),  Beijing,  China,  July  16-18,  1992,  Inti.  Academic  Publishers, 
101-109. 


17 


IAn  Approach  for  Solving  Systems  of  Parametric 
Polynomial  Equations 


Deepak  Kapur^ 

1.1  Abstract 

An  approach  for  solving  nonlinear  polynomial  equations  involving  pa¬ 
rameters  is  proposed.  A  distinction  is  made  between  parameters  and 
variables.  The  objective  is  to  generate  from  a  system  of  parametric 
equations,  solved  forms  from  which  solutions  for  specific  values  of  pa¬ 
rameters  can  be  obtained  without  much  additional  computations.  It 
should  be  possible  to  analyze  the  parametrized  solved  forms  so  that  it 
can  be  determined  for  different  parameter  values  whether  there  are  in¬ 
finitely  many  solutions,  finitely  many  solutions,  or  no  solutions  at  all. 
The  approach  is  illustrated  for  two  different  symbolic  methods  for  solv¬ 
ing  parametric  equations  -  Grobner  basis  computations  and  character¬ 
istic  set  computations.  These  methods  are  illustrated  on  a  number  of 
examples. 

1.2  Introduction 

Many  complex  phenomena  can  be  modeled  using  nonlinear  polynomial 
equations.  Examples  include  imaging  transformations  in  computer  vi¬ 
sion,  computing  geometric  invariants,  geometric  and  solid  modeling, 
constraint-based  modeling,  reasoning  about  geometry  problems,  prop¬ 
erties  of  chemical  equilibrium,  kinematics,  robotics;  an  interested  reader 
may  consult  [13,  14,  7,  5]  for  many  examples.  Variables  in  these  equa¬ 
tions  can  be  classified  into  two  subsets:  independent  variables  and  de¬ 
pendent  variables.  Independent  variables  often  correspond  to  input  to  a 
phenomenon  or  the  features  of  a  physical  subsystem  in  a  phenomenon. 
Henceforth,  independent  variables  will  also  be  called  parameters]  de¬ 
pendent  variables  will  be  called  just  variables  .  For  different  parameter 
values,  dependent  variables  have  different  behavior.  A  designer  is  usu¬ 
ally  interested  in  studying  the  phenomenon  on  a  wide  range  of  parameter 

^  Supported  in  part  by  a  grant  from  United  States  Air  Force  Office  of  Scientific 
Research  AFOSR-91-0361 .  This  approach  was  first  discussed  in  September  1990 
in  a  proposal  entitled  Iniegraied  Symbolical  and  Numerical  Methods  for  Advanced 
Computer  Vision  to  the  United  Air  Force  Office  of  Scientific  Research.  A  preliminary 
version  of  this  paper  was  presented  at  the  Chinese  Academy  of  Science,  Chengdu, 
China,  in  July  1992. 
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values. 

Given  as  input  is  a  system  of  polynomial  equations  in  variables  and  pa¬ 
rameters,  henceforth  called  a  system  of  parametric  equations.  In  certain 
cases,  a  set  of  constraints  on  parameters  specifying  parameter  values  of 
interest  may  be  given  separately  as  part  of  the  input.  An  equation  can 
be  viewed  as  a  nonlinear  polynomial  equation  in  variables,  with  the  co¬ 
efficient  of  a  term  in  a  polynomial  being  an  expression  (henceforth  called 
a  parametric  constraint  expression  or  simply  a  constraint  purely  in  terms 
of  parameters).^  Instead  of  having  to  repeatedly  solve  polynomial  equa¬ 
tions  for  many  different  specific  parameter  values,  the  objective  is  to 
solve  the  equations  once  for  all,  and  compute  solved  parametric  forms. 
Solutions  for  specific  parameter  values  can  be  obtained  from  a  solved 
parametric  form  without  much  additional  computations. 

A  simple  approach  for  solving  systems  of  parametric  equations  is  pro¬ 
posed.  The  approach  is  used  first  to  develop  an  algorithm  for  comput¬ 
ing  a  parametric  Grobner  basis  from  a  system  of  parametric  polynomials. 
The  construction  is  illustrated  using  many  examples.  Then  the  approach 
is  used  to  develop  an  algorithm  for  computing  a  parametric  character¬ 
istic  set.  For  this  construction,  the  discussion  is  brief  because  of  space 
limitations.  The  construction  is  illustrated  using  an  example.  More 
details  can  be  found  in  an  expanded  version  of  this  paper  [9].  From  a 
parametric  Grobner  basis  and  characteristic  set,  solutions  of  parametric 
polynomial  equations  can  be  extracted. 

The  discussion  in  this  paper  is  limited  to  parametric  constraints  ex¬ 
pressed  as  polynomials  over  the  parameters.  When  parameters  are  spe¬ 
cialized  to  specific  concrete  values  from  some  ground  field,  then  a  con¬ 
straint  expression  evaluates  to  a  value  in  the  ground  field  also;  the  ground 
field  used  in  this  paper  is  that  of  the  field  of  complex  numbers.  The 
proposed  approach  should  be  applicable  for  more  general  parametric 
constraints  also  (in  particular,  constraint  expressions  could  be  transcen¬ 
dental,  trigonometric  or  radical  expressions,  or  even  differential  polyno¬ 
mials)  insofar  as  they  constrain  the  coefficients  of  the  polynomials,  and 
constraints  can  be  manipulated  to  determine  whether  they  are  consistent 
or  not. 

^We  apologize  for  the  abuse  of  the  terminology  since  often  by  a  constraint,  we 
would  mean  a  formula  and  not  a  constraint  expression.  We  are  hoping  that  the 
context  would  be  able  to  disambiguate  the  use  of  the  phrase  constraint  without 
causing  any  confusion. 
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1.2.1  Properties  of  solved  parametric  forms 

Below  are  listed  desirable  features  of  a  solver  for  a  system  of  parametric 
equations  and  the  output  generated  by  the  solver. 

1.  The  output  should  be  in  a  solved  form  with  constraints  on  parameters. 
The  output  should  be  in  sufficient  detail  so  that  for  particular  parameter 
values,  minimal  (preferably  no)  additional  computation  has  to  be  per¬ 
formed  to  obtain  a  solved  form  from  which  solutions  can  be  extracted. 
For  example,  a  parametric  Grobner  basis  should  remain  a  Grobner  basis 
under  a  spccidlizaiioTiy  i.e.  when  specific  values  for  parameters  are  sub¬ 
stituted.  A  parametric  characteristic  set  should  remain  a  characteristic 
set  under  a  specialization. 

2.  The  output  should  generate  results  for  all  possible  values  of  parameters 
(that  satisfy  the  constraints  on  parameters  in  case  such  constraints  are 
specified  as  a  part  of  the  input). 

3.  Parametric  constraints  under  which  the  system  of  equations  does  not 
have  a  solution  should  be  clearly  identified,  and  should  not  be  mixed 
with  parametric  constraints  for  which  the  system  has  a  solution. 

4.  Parametric  constraints  under  which  the  system  of  equations  has  finitely 
many  solutions  should  also  be  separately  identified,  and  not  mixed  with 
parametric  constraints  for  which  the  system  has  infinitely  many  solu¬ 
tions.  It  will  be  preferred  that  a  solver  does  not  mix  parametric  con¬ 
straints  leading  to  solution  spaces  of  different  dimensions. 

1.2.2  Overview  of  the  approach 

The  approach  discussed  here  is  straightforward  and  easy  to  understand. 
Polynomial  computations  are  performed  by  making  assumptions  about 
the  parametric  constraint  expressions  arising  in  the  leading  coefficient 
of  the  polynomials.  For  Grobner  basis  constructions,  polynomials  are 
manipulated  in  their  distributed  representation,  whereas  for  character¬ 
istic  set  constructions,  polynomials  are  manipulated  in  their  recursive 
representation.  Assumptions  about  parametric  constraint  expressions 
are  collected  as  constraint  sets.  The  concept  of  a  constrained  polyno¬ 
mial  is  introduced  as  a  pair  consisting  of  a  finite  set  of  constraints  and 
a  polynomial  such  that  the  coefficients  of  its  terms  are  interpreted  with 
respect  to  the  constraint  set. 
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The  leading  coefficient  of  a  constrained  polynomial  may  or  may  not  be 
nonzero  with  respect  to  its  associated  constraint  set.  If  the  leading  coef¬ 
ficient  cannot  be  determined  to  be  nonzero  with  respect  to  its  constraint 
set,  the  constrained  polynomial  is  considered  ambiguous;  otherwise,  it 
is  considered  unambiguous.  Additional  constrained  polynomials  may  be 
generated  from  an  ambiguous  constrained  polynomial  by  extending  its 
constraint  set  to  make  its  leading  coefficient  nonzero.  Apart  from  that, 
the  basic  structure  of  computations  generated  by  algorithms  remains  the 
same. 

For  Grobner  basis  construction,  the  concepts  of  reduction  (simplifica¬ 
tion)  of  a  constrained  polynomial  by  another  constrained  polynomial  and 
S'-polynomial  of  constrained  polynomials  are  defined.  For  characteristic 
set  construction,  the  concept  of  a  chain  of  constrained  polynomials  and 
pseudo  division  of  a  constrained  polynomial  using  the  chain  are  defined. 
In  the  exposition  here,  algorithmic  aspects  have  been  emphasized  rather 
than  theoretical  aspects  because  algorithmic  constructions  are  viewed  as 
the  main  contributions  of  the  paper.  Theoretical  results  are  essentially 
obtained  in  a  straightforward  way  by  modifying  proofs  for  Grobner  basis 
and  characteristic  set  constructions  on  polynomial  systems  so  that  they 
work  for  parameterized  systems.  The  reader  is  assumed  to  be  familiar 
with  Grobner  basis  and  characteristic  set  constructions;  otherwise,  the 
reader  can  consult  an  introductory  paper  by  Kapur  and  Lakshman  [10]. 

1.2.3  Organization 

In  the  next  section,  assumptions  about  constraints  on  parameters  are 
stated.  Minimal  requirements  on  a  constraint  solver  are  discussed.  The 
notion  of  constrained  polynomials  is  introduced.  The  definitions  of  de¬ 
gree,  head  term,  head  coefficient,  and  head  monomial  are  extended  to 
constrained  polynomials  in  distributed  representation.  Unambiguous 
and  ambiguous  constrained  polynomials  are  defined. 

Section  3  extends  Grobner  basis  constructions  to  constrained  polyno¬ 
mials.  Key  concepts  such  as  5-polynomials  and  reduction  are  extended. 
A  parametric  Grobner  basis  is  defined  for  a  set  of  parametric  polyno¬ 
mials.  Parametric  Grobner  bases  can  be  used  for  analyzing  solutions 
of  parametric  polynomials,  computing  dimension,  and  other  properties 
of  solution  sets.  These  constructions  are  illustrated  on  a  number  of 
examples. 

Section  4  extends  characteristic  sets  to  constrained  polynomials.  De- 
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gree,  head  term,  initial,  class,  etc.  are  defined  for  constrained  polynomi¬ 
als  in  recursive  representation.  Definitions  of  pseudodivision  and  chains 
are  extended  to  work  on  constrained  polynomials.  Parametric  charac¬ 
teristic  sets  in  Wu's  sense  are  defined.  They  can  be  used  to  analyze 
solutions  of  parametric  polynomial  equations.  An  example  illustrating 
the  construction  is  briefly  discussed. 

During  the  development  of  this  approach,  we  came  to  know  about 
related  approaches  for  linear  equations  by  Sit  [16,  17],  and  for  nonlinear 
polynomial  equations  based  on  Wu’s  characteristic  set  method  by  Chou 
and  Gao  [4],  and  also  for  nonlinear  polynomial  equations  based  on  the 
Grobner  basis  method  by  Weispfenning  [18].  Apart  from  the  fact  that 
the  proposed  approach  is  simpler  and  easier  to  understand,  there  appear 
to  be  considerable  differences  between  the  methods  outlined  here  and  in 
the  above  cited  papers.  Some  of  these  differences  are  illustrated  later 
using  examples. 

1.3  Constraints  and  constrained  polynomials 
1.3.1  Constraints 

A  constraint  is,  in  general,  viewed  as  a  formula  over  parameters,  which 
is  true  or  false  based  on  values  substituted  for  free  parameters  appearing 
in  the  constraint.  Let  P,  called  a  parameter  space^  stand  for  the  set  of 
values  that  a  free  parameter  appearing  in  a  constraint  can  take.^  Given 
a  substitution  a  of  parameters  from  P  (also  called  a  specialization  or 
instantiation),  written  as  {txi  where 

are  parameters,  and  G  P,  a  constraint  c  and  a  constraint 

expression  cp  specializes  or  evaluates  under  a.  If  all  free  variables  in  c  (cp 
respectively)  are  specialized,  then  a{c)  (cr(cp),  respectively)  specializes 
to  a  truth  value  (a  value,  respectively). 

A  constraint  c  is  satisfiable  (or  consistent)  if  there  is  a  specialization 
a  such  that  cr(c)  is  true;  otherwise  c  is  unsatisfiable  (or  inconsistent). 
Similarly,  a  set  C  of  constraints,  called  a  constraint  set,  is  consistent 
if  and  only  if  there  is  a  specialization  such  that  the  constraints  in  C 
are  satisfiable  using  that  specialization;  a  constraint  set  C  is  thus  a 
conjunction  of  constraints.  C  is  inconsistent  if  the  constraints  in  C 

^In  general,  parameters  may  be  typed,  i.e.  different  parameters  may  range  over 
different  sets  of  v^llues,  The  proposed  approach  generalizes  to  typed  par2mieters. 
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cannot  be  simultaneously  satisfied  no  matter  what  specialization  is  used. 
Let  spcl{C)  be  the  set  of  all  specializations  satisfying  constraints  in  C. 

Given  a  constraint  c,  let  negic)  stand  for  the  negation  of  c  to  mean 
that  for  a  given  c  substituting  for  all  free  parameters  in  c,  (t{c)  is  true 
if  and  only  if  (r{neg{c))  is  false,  as  well  as  that  the  set  {c,  neg{cy)  is 
inconsistent.  Given  a  consistent  constraint  c  and  a  consistent  constraint 
set  C,  CU  {c}  or  CU  {neg{c)}  is  consistent;  c  is  consistent  with  C  iff 
C  U  {c}  is  consistent.  A  constraint  c  is  dependent  with  respect  to  a 
consistent  C  if  and  only  if  exactly  one  of  C  U  {c}  and  C  U  {neg{c)}  is 
consistent.  A  constraint  c  is  independent  of  a  constraint  set  C  if  both 
C  U  {c}  as  well  as  C  U  {neg{c)}  are  consistent.  Further,  C  [=  c  iff  for 
every  specialization  cr,  if  (t{C)  is  true,  (7(c)  is  true,  i.e.  C  U  {ne^(c)}  is 
inconsistent. 

Requirements  on  constraint  solver 

The  main  requirement  on  a  constraint  solver  is  that  it  should  be  a  de¬ 
cision  procedure  for  determining  consistency  of  a  constraint  set  C.  For 
efficiency,  it  should  be  preferably  incremental  in  the  sense  that  the  con¬ 
sistency  check  of  C  U  {c}  as  well  as  C  U  {ne^(c)}  should  be  able  to  use 
intermediate  results  from  the  consistency  check  of  C. 

Polynomial  equations  and  inequations  as  constraints 

In  the  above  discussion,  a  logical  view  of  constraints  has  been  adopted 
without  fixing  a  language  for  expressing  constraints.  In  the  rest  of  the 
paper,  however,  constraints  are  assumed  to  be  of  the  following  forms: 
cp  =  0  or  cp  7^  0,  where  cp  is  a  polynomial  over  the  parameters.  If  c  is 
an  equation  cp  =  0,  then  neg{c)  is  the  inequation  cp  ^  0  and  similarly, 
if  c  is  an  inequation  cp  ^  0,  then  neg{c)  is  cp  =  0.  The  parameter 
space  P  is  assumed  to  be  an  algebraically  closed  field,  for  example,  the 
field  of  complex  numbers.  The  constraint  cp  =  0  (respectively,  cp  ^  0) 
is  satisfiable  (or  consistent)  iff  there  exists  a  specialization  a  such  that 
a{cp)  evaluates  to  0  (respectively,  does  not  evaluate  to  0),  and  cp  =  0 
(respectively,  cp  ^  0)  is  not  satisfiable  if  for  every  possible  specialization 
c,  cr{cp)  evaluates  to  a  value  different  from  0  (respectively,  evaluates  to 

When  constraints  are  polynomial  equations  and  inequations,  a  Grobner 
basis  algorithm  or  a  characteristic  set  algorithm  can  be  used  for  consis- 
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tency  check.  However,  in  the  discussion  below,  it  is  assumed  that  some 
constraint  solver  is  available  for  querying  whether  C  U  {{cp  -  0)}  and 
CU  {(cp  ^  0)}  are  consistent  assuming  that  C  is  consistent. 

1.3.2  Constrained  Polynomials 

A  parameterized  polynomial  is  interpreted  in  the  context  of  a  set  of  con¬ 
straints  on  parameters.  For  different  values  of  free  parameters  satisfying 
the  set  of  constraints,  a  parameterized  polynomial  has  different  struc¬ 
ture  (i.e.  different  terms)  since  the  coefficients  of  terms  may  become 
0.  A  constrained  polynomial  is  defined  as  a  pair  <  (7,  p  >,  where  p  is  a 
polynomial  over  parameters  and  variables,  C  is  a  finite  set  of  constraints 
over  parameters.  To  contrast,  a  polynomial  (without  any  constraints) 
is  an  unconstrained  polynomial  p  and  is  written  as  <  0,p  >.  Similarly, 
a  constrained  polynomial  <  C,p>  such  that  for  every  specialization  d, 
cr(C)  is  true,  is  the  same  as  the  unconstrained  polynomial  p.  As  a  for¬ 
mula,  <  C,p  >  can  be  viewed  as  C  =>  p  =  0.  A  constrained  polynomial 
<  C,p  >  in  which  C  is  inconsistent  (unsatisfiable),  is  said  to  be  unde¬ 
fined.  Such  constrained  polynomials  will  never  be  generated  in  a  compu¬ 
tation.  Below,  by  a  constrained  polynomial  <  C,p  >,  we  mean  the  one 
in  which  C  is  consistent,  unless  stated  otherwise.  Further  <  <t(C'),p  > 
can  also  be  written  as  <  C  U  Ea,P  >?  where  Ea  are  the  equations 
{uj  =  t;i,  •  •  • ,  Um  =  Vm}  corresponding  to  a  =  {ui  ^  vi,  ♦  •  • ,  Um  ^  ^m}- 
A  specialization  a  can  be  applied  on  a  set  of  constrained  polynomials 
CP  as  well;  aiCP)  =  {<  (t{C),(t{p)  >  \  <  C,p  >E  CP}.  If  a  spe¬ 
cialization  cr  instantiates  all  parameters  in  <  C,p  >,  then  either  cr(C)  is 
inconsistent  or  (x{C)  is  consistent  and  equivalent  to  0  and  in  that  case, 
cr(<  CjP  >)  is  equivalent  to  <r{P),  an  unconstrained  polynomial  without 
any  parameters. 

Two  constrained  polynomials  <  C,p  >  and  <  D,  ^  >  are  equivalent 
iff  for  every  specialization  either  both  <r(C)  and  (^{D)  are  unsat¬ 
isfiable,  or  both  ct{C)  and  a{D)  are  true  and  <7(p)  =  cr{q).  A  con¬ 
strained  polynomial  <  CjP  >  is  nonzero  (written  os  <  CyP  0) 
if  there  exists  a  satisfying  specialization  a  of  C  such  that  a{p)  ^  0. 
By  slightly  abusing  the  notation,  a  constrained  polynomial  <  C,p  > 
subsumes  another  constrained  polynomial  <  Dyq>  if  for  every  special¬ 
ization  a  satisfying  D,  (t{C)  is  true  and  (t(p)  =  f-his  implies  that 
{<  C,p  >,</),(/>}  =  {<  C,p  >}.  Obviously,  <  C,p  >  is  equivalent 
to  <  Dyq  >  if  <  CyP  >  subsumes  <  £),g  >  and  <  >  subsumes 
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<  CyP  >.  Given  <  CyP  >,  its  simplified  form  is  <  Dyq  >,  where  D 
is  a  simplified  form  of  C  and  q  only  includes  those  terms  of  p  whose 
coefficients  cannot  be  deduced  to  be  0  from  C, 

A  set  of  constrained  polynomials  CP  can  be  simplified  by  remov¬ 
ing  any  constrained  polynomial  from  CP  that  is  subsumed  by  another 
constrained  polynomial  in  CP.  Two  sets  of  constrained  polynomials 
CP  and  DP  are  equivalent  if  and  only  if  for  every  specialization  cr, 
a(CP)  =  a{DP). 

1.3.3  Degree,  head  term 

The  degree,  head  term,  and  head  coefficient  of  a  constrained  polynomial 
are  determined  by  its  constraint  set.  As  in  the  case  of  polynomials,  a 
constrained  polynomial  can  be  viewed  as  a  constrained  univariate  poly¬ 
nomial  in  one  of  its  variables  (recursive  representation),  or  a  constrained 
multivariate  polynomials  (distributed  representation).  Recursive  repre¬ 
sentation  is  useful  for  univariate  resultant  computation  and  for  comput¬ 
ing  characteristic  sets,  whereas  distributed  representation  is  useful  for 
multivariate  resultant  computation  and  for  Grobner  basis  computations. 
In  this  subection  and  the  section  on  Grobner  basis  computations,  dis¬ 
tributed  representation  of  polynomials  is  discussed.  In  the  section  on 
characteristic  set  computations,  recursive  representation  of  polynomials 
will  be  used. 

In  distributed  representation,  the  coefficient  of  a  term  in  a  polynomial 
is  a  constraint  expression.  This  coefficient  may  or  may  not  be  zero  de¬ 
pending  upon  a  constraint  set  associated  with  the  polynomial.  Let  > 
be  a  total  admissible  ordering  on  terms  [2].  The  potential  degree  of  a 
constrained  polynomial  <  C,p  >  is  defined  as  the  degree  of  the  highest 
term  t  in  p  such  that  (i)  it  is  not  the  case  that  C  [=  (c  =  0),  where 
c  is  the  coefficient  of  i  in  p,  and  and  (ii)  for  all  terms  >  f,  c',  the 
coefficient  of  P  in  p,  is  such  that  C  ^  (c'  =  0).  Term  t  is  then  called 
the  potential  head  term  of  <  CyP>;  c  and  ct  are,  respectively,  called  the 
potential  head  coefficient  and  potential  head  monomial  of  <  C,p  >.  Let 
phrTiyphtyphc  stand,  respectively,  for  functions  for  computing  the  poten¬ 
tial  head  monomial,  potential  head  term  and  potential  head  coefficient 
of  a  constrained  polynomial. 

The  degree  of  a  constrained  polynomial  <  C,p  >  is  defined  as  the 
degree  of  the  highest  term,  say  ty  in  p  whose  coefficient,  say  c,  is  such 
that  CU  {c  =  0}  is  inconsistent,  i.e.  c  cannot  be  0  with  respect  to  C. 
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Head  term,  head  coefficient  and  head  monomial  of  a  constrained  polyno¬ 
mial  are  defined  in  a  similar  manner.  Let  /im,  hi^  he  stand,  respectively, 
for  functions  for  computing  the  head  monomial,  head  term  and  head 
coefficient  of  a  constrained  polynomial. 

The  distinction  between  the  potential  degree  and  degree  of  a  con¬ 
strained  polynomial  <  (7,  p  >  is  that  its  potential  degree  is  greater  than 
or  equal  to  its  degree  because  p  may  have  terms  whose  coefficients  may 
or  may  not  be  0  with  respect  to  C. 

1.3.4  Unambiguous  and  ambiguous  constrained  polynomials 

A  constrained  polynomial  <  C,  p  >  is  unambiguous  if  the  potential  head 
monomial  of  <  C,p  >  is  also  the  head  monomial  of  <  C,p  >,  i.e.  (i) 
phm{<  C,p  >)  =  Oiti  (ii)  Cu{a,-  =  0}  is  inconsistent,  and  (iii)  CU{a,-  ^ 
0}  is  consistent.  Otherwise,  <  (7,  p  >  is  called  ambiguous.  Further, 
as  more  constraints  are  added  to  C  without  destroying  its  consistency, 
the  head  term  of  an  unambiguous  <  C,p  >  does  not  change,  i.e.  for 
an  unambiguous  <  C,p  >,  for  each  consistent  constraint  set  D  such 
that  spcl(D)  C  5pc/((7),  pht{<  (7,p  >)  =  pht{<  D^p  >).  But  for  an 
ambiguous  constrained  polynomial,  its  potential  head  term  and  head 
coefficient  may  change  as  more  constraints  are  added  to  C. 

For  example,  consider  p  =  ux^y-\-v^x'^y-\-vy-\-ux  —  {u-\-v^)x'^y-\-vy~{- 
t/x,  where  u^v  are  parameters  and  are  variables.  In  the  distributed 
representation,  in  which  p  is  viewed  as  a  multivariate  polynomial,  under 
the  total  degree  ordering  induced  by  the  variable  ordering  y  >  x,  the 
degree  of  p  relative  to  the  empty  constraint  set  is  0;  the  potential  degree 
of  <  0,p  >  is  3,  and  the  potential  head  term  is  x^y  with  u  being 
the  potential  head  coefficient.  The  degree  of  p  relative  to  {u  +  0} 

is  3  and  the  head  term  is  x'^y.  The  degree  of  p  relative  to  {u  -f  =  0} 
is  still  0  since  it  cannot  be  said  whether  i;  =  0,  u  =  0  or  not.  The  degree 
of  p  relative  to  {u  -h  =  0,  v  ^  0}  is,  1,  and  the  head  term  is  y.  The 
simplified  form  of  <  {u  -h  =  0},p  >  is  <  {u  -h  =  0},  vy  -h  ux  >. 

Unambiguous  polynomials  from  an  ambiguous  polynomial 

From  an  ambiguous  constrained  polynomial  <  (7,p  >  with  the  poten¬ 
tial  head  coefficient  a,-,  an  equivalent  finite  set  of  unambiguous  con¬ 
strained  polynomials  can  be  generated  based  on  the  following  property: 
{<  (7,p  >}  =  {<  CU  {a*  0},p  >,  <  (7  U  {ai  =  0},p  >}.  The  first 
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element  is  an  unambiguous  constrained  polynomial,  whereas  the  second 
element  could  be  ambiguous  for  which  the  construction  is  repeated  till 
all  the  constrained  polynomials  generated  are  unambiguous. 

Lemma  1  Given  a  finite  set  of  ambiguous  constrained  polynomials,  there 
exists  an  equivalent  finite  set  of  unambiguous  constrained  polynomials, 
and  this  set  can  be  obtained  using  the  above  algorithm. 

Compactification 

It  is  also  possible  to  define  a  transformation  from  unambiguous  con¬ 
strained  polynomials  to  possibly  ambiguous  constrained  polynomials, 
especially  if  a  set  includes  constrained  polynomials  with  the  same  poly¬ 
nomial  component  but  possibly  with  different  constraint  sets.  Using  the 
property  that  {<  C  U  {ai  ^  0},p>,<  C  U  {m  -  0},p>}  =  {<  C,p>} 
if  phc{<  C,p  >)  =  ai,  and  neither  C  |=  (a,-  =  0)  nor  C  (a,*  0). 

In  general,  given  two  constrained  polynomials  <  C,p  >  and  <  D,p  >, 
{<  (7,p  >,  <  D^p  >}  =  {<  Eyp  >}  if  it  is  possible  to  generate  a  con¬ 
straint  set  E  that  corresponds  to  the  disjunction  of  constraint  sets  C  and 
D  since  spcl{E)  =  spcl{C)\Jspcl{D),  Often  this  may  be  possible  because 
of  the  way  constraints  get  associated  with  a  constrained  polynomial. 

1*4  Grobner  Basis  Computation 

In  this  section,  Grobner  basis  computations  are  extended  to  constrained 
polynomials.  The  reader  is  assumed  to  be  familiar  with  concepts  and  def¬ 
initions  related  to  Grobner  basis  computations,  in  particular  the  notions 
of  reduction  of  a  polynomial  by  another  polynomial,  5-polynomials  and 
completion  algorithm.  For  an  introduction  to  Grobner  basis,  the  reader 
may  consult  [1,  2,  7,  10].  In  a  constrained  polynomial  <  C,p  the 
constraint  set  C  is  assumed  to  be  consistent.  Further  a  specialization  a 
is  assumed  to  substitute  values  for  all  parameters  in  a  set  of  constrained 
polynomials. 

Definition  1  A  finite  set  GCB  of  constrained  polynomials  is  a  paramet¬ 
ric  Grobner  basis  if  and  only  if  for  every  specialization  <r  of  parameters 
in  GCBj  a[GCB)  is  equivalent  to  a  finite  set  GB  of  polynomials  that 
is  a  Grobner  basis. 
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The  above  definition  captures  the  fact  that  for  any  specialization, 
a  Grobner  basis  can  be  obtained  from  a  parametric  Grobner  basis  by 
substituting  for  the  parameters  and  simplifying  the  constrained  polyno¬ 
mials,  thus  satisfying  one  of  the  requirements  for  solved  forms  stated  in 
the  introduction. 

Definition  2  A  finite  set  GCB  of  constrained  polynomials  is  a  para¬ 
metric  Grobner  basis  of  a  set  CB  o{  constrained  polynomials  if  and  only 
(i)  GCB  is  a  parametric  Grobner  basis,  and  (ii)  if  for  every  specializa¬ 
tion  <7  of  parameters  in  CB,  a{GCB)  is  equivalent  to  a  finite  set  GB  of 
polynomials,  which  is  a  Grobner  basis,  generating  the  same  ideal  as  the 
ideal  generated  by  the  set  B  of  polynomials  equivalent  to  a{CB). 

1.4.1  5- polynomial  computations 

An  important  operation  in  a  Grobner  basis  computation  is  that  of  com¬ 
puting  an  S-polynomial  of  a  pair  of  distinct  polynomials  (also  called 
critical  pair  in  the  term  rewriting  literature).  The  simplification  of 
a  polynomial  by  another  polynomial  can  be  viewed  as  a  special  case 
of  5-polynomial  computation.  Below  we  extend  the  concept  of  an  5- 
polynomial  to  constrained  polynomials. 

For  two  nonequivalent  constrained  polynomials  <  C,p>  and  <  D,q  > 
such  that  CU  D  is  consistent,  their  constrained  S-polynomial  is  defined 
as:  Let  phm{<  CU  D,p>)  =  aiU  and  phm{<  C  U  D,  ^  >)  =  such 
that  there  exist  smallest  /j,  fiU  =  fjtj,  fi  ^  tj,  and  fj  ^  U, 

s-poly{<  C,p>,<D,q  >)  ==<  CUD^ajfip-aifjq  >  . 

For  example,  given 

cpi  =<  {t;  0},  vxy-\-  ux^  x  >,  cp2  =<  {u  ^  0},  >, 

since  the  combined  constraint  set  {w  ^  0,  v  9^  0}  is  consistent,  and 
assuming  a  lexicographic  order  on  terms  defined  by  the  variable  order 
y>  X,  phm{<  {u^0,v^  0},  vxy  -h  -h  x  >)  =  vxy  and 
phm{<  {ti  9^  0,v  9^  0},iiy2  >)  =  uy^;  ai  =  v,a2  =  u,ti  =  xy,fi  = 

y,^2  =  y^)/2  =  2r-  The  5-polynomial  of  cpi  and  cp2  is 

cp3  =  <  {u  9^  0,  t;  9^  0},  u^x^y  -h  uxy  -  vx^  >  , 

The  5-polynomial  of  two  unambiguous  constrained  polynomials  need 
not  be  unambiguous. 
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Lemma  2  For  any  specialization  (t  satisfying  C\JD  and  the  constrained 
5-polynomial  <  C\J  D,  ajfip  —  aifjq  >  of  <  (7,p  >  and  <  D^q  >,  let 
p',  q\  r'  be  the  polynomials  equivalent  to  cr(<  C,p  >),  (t{<  D,  q  >), 
cr(<  CUD,  ajfip  —  aifjq  >),  respectively.  Then,  r'  is  the  5-polynomial 
of  p'  and  q\ 

The  above  lemma  establishes  the  soundness  of  5-polynomial  computa¬ 
tion.  It  is  not  necessary  to  assume  that  any  of  <  C,p  >  and  <  D,q  >  he 
unambiguous,  because  even  if  a,-  or  aj  is  0  for  a  particular  specialization, 
the  result  is  still  in  the  required  ideal.  It  is  perhaps  better  to  define  5- 
polynomials  for  pairs  of  unambiguous  polynomials  only;  however,  we  do 
not  have  much  experimental  experience  with  these  two  different  heuris¬ 
tics  to  say  which  one  is  better.  All  the  calculations  below  are  illustrated 
using  unambiguous  polynomials. 

1.4.2  Reduction 

For  reduction  (rewriting),  it  is  more  useful  to  have  both  the  constrained 
polynomials  to  be  unambiguous  as  otherwise  polynomials  may  not  be 
reducing!!  If  the  constrained  polynomial  being  reduced  is  ambiguous 
and  its  head  coefficient  can  be  0,  the  result  may  not  be  smaller  than  the 
original  constrained  polynomial.  If  the  constrained  polynomial  used  to 
reduce  another  polynomial  is  ambiguous  and  its  head  coefficient  can  be 
0,  then  again  no  reduction  is  taking  place. 

Reduction  as  a  special  case  of  5-polynomial  Computation 

Given  two  unambiguous  constrained  polynomial  <  (7,p  >  and  <  D,  g  >, 

<  CjP>  can  be  reduced  (rewritten)  hy  <  D^q  >  it  C  U  D  is  consistent, 
hm(<  CUD,p  >)  =  hm{<  C,p>)  =  hm(<  CU  D^q  >)  = 
hm{<  D^q  >)  =  ttjtj  and  U  =  fjtj  for  some  fj.  In  that  case, 

<  CUD^  ajp  >  is  said  to  reduce  to  (or,  rewrite  to)  <  CUD,  > 

in  a  single  step.'^  It  is  easy  to  see  that  the  result  is  smaller  than  the 

<  C  U  D,  ajp  >  in  a  well-founded  order  on  the  polynomial  part  of  the 
constrained  polynomials.  This  ensures  that  the  reduction  process  always 
terminates.  The  reader  would  have  noticed  that  this  computation  is  a 
special  case  of  5-polynomial  computation  discussed  above. 

^  Since  p  is  in  ein  ideal  implies  ajp  is  also  in  the  ideal,  it  is  okay  to  reduce  ajp 
instead  of  p. 


An  Approach  for  Solving  Systems  of  Parametric  Polynomial  Equations  13 


Unlike  in  the  case  of  the  classical  Grobner  basis  construction,  <  C  U 
D,ajp  —  aifjq  >  cannot  replace  <  C,p  >  unless  C  \=  D  as  otherwise 
for  a  specialization  a  satisfying  C  but  not  D,  the  ideal  would  not  be  the 
same.  If  C  \=  D,  then  <C,p>  can  be  replaced  by  <  C,ajp- aifjq  > 
or  deleted  if  <  C,  ajp  -  ajjq  >  is  equivalent  to  <  C,  0  >. 


Reduction  as  Replacement 


It  is  possible  to  define  reduction  so  that  the  constrained  polynomial  being 
reduced  can  always  be  replaced  by  a  finite  set  of  constrained  polynomi¬ 
als.  If  it  is  not  the  case  that  C\=D,  then  <  C,p  >  can  be  replaced  by 
<  CUD,  ajp-atfjq  >  and  the  set  T  =  {<  Cyj{ne9{d)),p>  |  d  e  CU 
{neg{d)}  is  consistent  and  p  does  not  simplify  to  0  with  respect  to  C  U 
{neg(d)}}.  The  family  of  constraint  sets  {CU  {nefli(d)}  \  d  €  D}  rep¬ 
resents  all  specializations  which  satisfy  C  but  do  not  satisfy  D.®  A 
constrained  polynomial  <  C,p  >  is  said  to  reduce  in  a  single-step  to 
{<  C\JD,ajp-aifjq  >}UT  in  which  at  least  one  polynomial  is  smaller 
than  p  and  all  other  constrained  polynomials  have  constraint  sets  which 
are  strictly  implied  by  C. 

A  single-step  reduction  on  a  finite  set  S  of  constrained  polynomiak  is 
defined  as  follows:  S  reduces  to  5'  in  one  step  by  <£>,?  >  if  there  is 
a  constrained  polynomial  <  C,p  >€  S  that  reduces  by  <£),</>  in  a 
single-step  to  a  finite  set  T  of  constrained  polynomials  and  S'  =  S- 
{<  C,p  >}  UT.  The  multi-step  reduction  is  then  the  transitive  closure 
of  the  single-step  reduction.  Multi-step  reduction  terminates  because 
at  least  one  polynomial  in  the  polynomial  components  of  constrained 
polynomials  is  getting  smaller  in  the  well-founded  order. 

An  unambiguous  constrained  polynomial  <  C,  p  >  is  reduced  (or,  in 
a  normal  form)  with  respect  to  a  set  of  unambiguous  constrained  poly¬ 
nomials  GP  if  it  cannot  be  reduced  by  any  element  in  CP.  Similarly,  a 
set  S  of  unambiguous  constrained  polynomiak  is  reduced  with  respect 
to  GP  if  every  polynomial  in  S  is  reduced  with  respect  to  GP. 

A  constrained  polynomial  <  C,p  >  reduces  to  0  with  respect  to  a 
set  of  unambiguous  constrained  polynomiak  GP  if  <  C,p  >  reduces  in 
finitely  many  steps  to  a  finite  set  of  constrained  polynomiak  equivalent 


5  If  we  use  complex  constraints  that  are  propositional  formulas  built  from  b^ic 
constrmnts  of  the  form  ce  =  0,  then  an  alternate  ai^  more  compact  representation 
would  be  C  U  {neg{di )  V  •  •  •  V  neg(d,,)  I  di ,  •  •  • ,  dk  €  D}.  A  disjunction  of  constrmnts 
that  are  equations  can  be  replaced  by  the  product  of  the  polynomiak  in  the  equations, 
i.e.  cei  =  0  V  ce2  =  0  can  be  replaced  by  cei  *  ce2  —  0. 
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to  0. 

Lemma  3  Given  two  nonequivalent  unambiguous  <  C,  p  >  and  <  D,q  > 
such  that  Cud  is  consistent,  and  hm{<  (7uD,p  >)  =  hm{<  C,p  >)  = 
aiU,  hm{<  C  U  D^q  >)  =  hm{<  D^q  >)  —  ajij,  U  =  fjtj  for  some  /j, 
and  <  Cy  Qjp  >  reduces  by  <  D,  ^  >  to  5  =  {<  CU  D,  ajp  -  Cifjq  >}U 
{<  C  U  {ne^(d)},p  >  \  d  £  D,C  U  {neg{d)}  is  consistent ,  <  C  U 

{neg{d)}yp  0,}.  For  every  specialization  a  satisfying  Cy  if  a  also 
satisfies  Dy  then  p'  reduces  by  q^  to  where  p^  r'  are  equivalent  to 
a{<  CyP  >),  cr(<  Dy  q  >),  <7(<  CU  Dy  ajp  —  aifjq  >),  respectively,  and 
r'  is  in  the  ideal  of  p'  and  q\  Further,  if  a  does  not  satisfy  Dy  then  S 
includes  a  constrained  polynomial  <  D\p>y  where  is  satisfied  by  cr. 

1.4.3  Test  for  a  Parametric  Grobner  Basis 

A  test  for  a  set  of  unambiguous  constrained  polynomials  to  be  a  paramet¬ 
ric  Grobner  basis  can  be  given  using  5-polynomial  computation.  (If  the 
set  includes  ambiguous  constrained  polynomials,  they  can  be  replaced 
by  an  equivalent  set  of  unambiguous  constrained  polynomials;  the  test 
applies  to  a  set  including  ambiguous  polynomials  also.) 

Theorem  1  Aset  CP  of  cons  trained  polynomials  is  a  parametric  Grobner 
basis  if  and  only  if  for  every  pair  of  nonequivalent  unambiguous  con¬ 
strained  polynomials  <  C,p  >  and  <  D,  g  >  from  CP  such  that  CUD  is 
consistent,  their  5-polynomial  (i.e.  the  set  of  unambiguous  constrained 
polynomials  equivalent  to  it)  reduces  to  0  by  CP. 

The  proof  of  the  theorem  follows  the  pattern  of  a  proof  in  the  case  of 
the  classical  Grobner  basis  test  for  polynomials. 

1.4.4  Parametric  Grobner  Basis  Algorithm 

The  above  test  can  be  used  to  design  an  algorithm  for  computing  a  para¬ 
metric  Grobner  basis  from  a  set  of  constrained  polynomials  as  in  the  case 
of  a  classical  Grobner  basis  algorithm  on  polynomials.  A  simple  version 
of  such  an  algorithm  takes  a  finite  set  of  polynomials  as  input,  makes 
constrained  polynomials  out  of  them  by  associating  the  empty  constraint 
set  with  each  polynomial  and  repeatedly  performs  the  following  steps: 

1 .  replace  every  ambiguous  constrained  polynomial  by  an  equivalent  set  of 
unambiguous  constrained  polynomials, 
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2.  for  every  pair  of  nonequivalent  unambiguous  constrained  polynomials, 

(a)  generate  an  5-polynomial,  if  any, 

(b)  find  an  equivalent  set  of  unambiguous  constrained  polynomials  to  it, 

(c)  reduce  each  of  these  unambiguous  constrained  polynomials  to  a  nor¬ 
mal  form,  which  itself  is  a  finite  set  of  constrained  polynomials,  and 

(d)  if  a  nonzero  normal  form  is  generated,  add  all  nonzero  constrained 
polynomials  in  a  normal  form  to  the  basis,  and  compute  new  5- 
polynomials  with  other  polynomials  in  the  basis. 

This  process  is  continued  until  all  5-polynomials  among  pairs  of  con¬ 
strained  polynomials  have  been  considered  or  reduce  to  0.  Suitable  data 
structures  and  book  keeping  mechanisms  can  be  developed  to  avoid  un¬ 
necessary  computations.  Optimizations  suggested  in  the  literature  for 
identifying  unnecessary  5-polynomials  extend  to  parametric  Grobner 
basis  computations  also. 

The  termination  of  the  above  algorithm  is  based  on  Hilbert’s  basis 
condition  and  the  following  additional  argument.  Every  term  in  a  con¬ 
strained  polynomial  could  possibly  become  a  head  term.  At  any  given 
iteration,  there  are  only  finitely  many  possibilities  for  generating  differ¬ 
ent  possible  Grobner  bases  assuming  that  all  parameters  are  specialized. 
If  the  above  algorithm  does  not  terminate,  there  is  an  infinite  path  by 
Konig’s  lemma  since  the  branching  factor  is  finite  in  the  tree  of  possible 
Grobner  bases.  And,  the  existence  of  an  infinite  path  is  impossible  since 
that  contradicts  Hilbert’s  basis  condition. 

The  method  presented  here  has  been  tried  on  many  examples  including 
those  in  [18]  and  [4].  Our  outputs  are  different  from  the  ones  reported 
in  [18].  Example  1  in  [18],  for  instance,  is  a  discussion  of  a  set  of  two 
generic  univariate  quadratic  polynomials;  a  Grobner  basis  given  in  [18] 
includes  two  parametric  cubic  polynomials.  In  our  computation,  higher 
degree  polynomials  will  not  be  computed. 

From  a  parametric  Grobner  basis  consisting  of  unambiguous  con¬ 
strained  polynomials,  a  subset  which  constitutes  a  parametric  Grobner 
basis  with  respect  to  a  constraint  set  C  can  be  extracted,  and  this  sub¬ 
set  serves  as  a  parametric  Grobner  basis  with  respect  to  C .  For  every 
constrained  polynomial  <  >  in  a  parametric  Grobner  basis  such 

that  CUD  is  consistent,  a  new  constrained  polynomial  <  D U  C,p  >  is 
included.  For  a  specialization  <t,  the  subset  of  constrained  polynomials 
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whose  constraints  are  satisfied  by  tr,  is  equivalent  to  a  set  of  polynomials 
that  constitute  a  Grobner  basis  in  the  usual  sense.  A  Grobner  basis  thus 
obtained  may  have  redundant  elements;  any  polynomial  in  a  Grobner 
basis  whose  head  monomial  can  be  simplified  using  other  polynomials  is 
redundant. 

Using  Parametric  Grobner  Basis  for  Analyzing  Solutions 

Just  like  Grobner  basis  for  polynomials,  a  parametric  Grobner  basis  can 
be  used  to  study  and  analyze  the  solution  space  of  a  system  of  paramet¬ 
ric  polynomial  equations  with  respect  to  a  constraint  set  on  parameters. 
It  is  possible  to  compute  the  dimension  of  different  components  of  the 
variety,  as  well  as  compute  other  properties  of  the  associated  ideal  and 
variety  for  different  constraints  on  parameters.  This  is  done  in  a  way 
similar  to  computing  dimension  of  a  polynomial  ideal  from  its  Grobner 
basis.  For  parametric  polynomials  in  which  for  every  parameter,  the 
ideal  generated  is  zero-dimensional,  it  is  possible  to  use  the  basis  con¬ 
version  algorithm  of  [6]  to  obtain  a  parametric  Grobner  basis  with  re¬ 
spect  to  a  lexicographic  ordering  from  a  Grobner  basis  constructed  using 
another  term  ordering  such  as  degree  ordering  or  reverse  lexicographic 
ordering.  Computations  of  solutions  from  parametric  Grobner  bases  are 
briefly  illustrated  later. 

Example  1:  We  illustrate  the  Grobner  basis  construction  on  a  simple 
example  that  models  a  chemical  system  and  explains  chemical  equilib¬ 
rium.  The  example  is  discussed  in  [18]  and  [4]  so  it  also  contrasts  the 
proposed  approach  with  methods  discussed  in  [18]  and  [4]  as  well  as 
illustrates  differences  in  computational  steps  and  outputs.  The  set  of 
input  polynomials  is: 

{pi  ~  X4  —  (04  —  02),  P2  =  xi  X2  +  X3  X4  —  (ai  “f  03  +  ^4)5 
P3  =  X1XSX4  —  aia3a4, 

P4  =  xixs  -h  X1X4  -{-  X2X3  H-  X3X4  —  (0104  -f-  aioa  -f  a3a4)}, 

where  01,02,03,^4  are  parameters,  and  xi,  3:2? ^3,  ^4  are  variables.  The 
lexicographic  ordering  on  terms  defined  by  Xi  >  X2  >  xs  >  X4  is  used. 
The  corresponding  unambiguous  constrained  polynomials  are: 

l:<0,Pi>,  2:<0,P2>,  3:<0,P3>,  4:<0,p4>. 
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Polynomials  3  and  4  can  be  simplified  using  1  and  2  in  the  classical 
way  since  the  associated  constraint  sets  are  empty.  Polynomial  3  gets 
simplified  to  3’  and  replaces  3.  3'  :  <  0,^3  >,  where  qs  =  (04  -  02)^2 X3*f 
(a4-a2)a:|-(ai+a2+a3)(a4-a2)x3-{-aia3a4.  Similarly,  4  can  be  replaced 
by:  4'  :  <  0,^4  >,  where  q4  =  (04  —  a2)x2  +  X3  —  (ai  +  02  +  03)^3  + 

(aia2  +  +  a^as  +  0103  -  0204).  Neither  3’  nor  4’  is  unambiguous; 

from  4^  we  get:  5  :  <  {04  —  02  ^  0},54  >  and  6  :  <  {04  —  02  = 

0},g4  >,  both  of  which  are  unambiguous.  Similarly,  from  3’,  we  get: 

7  :<  {04  -  02  0},  93  >  and  8  :  <  {04  -  02  =  0},  93  >  •  Polynomial  8  is 

ambiguous  and  exhibits  an  interesting  case  since  after  simplification,  it 
is  equivalent  to  <  {04  -  02  =  0),  010304  >  .  Unambiguous  polynomials 
equivalent  to  it  are:  9  :  <  {04  —  02  =  0,010304  ^  0},  010304  >,  and 
10  :  <  {04  -  02  =  0, 010304  =  0},  0  >  . 

This  reveals  that  the  original  system  is  inconsistent  for  the  constraint 
set  {04  -  02  =  0, 010304  0}.  This  also  follows  from  the  original  system 

of  parametric  polynomials  above;  for  parameter  values  satisfying  these 
constraints,  there  is  indeed  inconsistency  since  if  04  =  02,  then  0:4  =  0 
simplifying  p3  to  a  constant  010304. 

Polynomial  5  can  be  used  to  simplify  polynomial  7  and  the  result, 
7'  :  <  {04  -  02  7^  0},  97  >,  where  97  =  -  (oi  +  04  +  03)x3  -f  (01O4  + 

0304  +  0i03)x3  -  010304,  replaces  7. 

The  set  {1, 2, 5, 6, 7',  9}  constitutes  a  parametric  Grobner  basis. 

It  can  be  easily  checked  that  every  specialization  of  the  above  paramet¬ 
ric  Grobner  basis  that  satisfies  one  of  these  constraint  set  {04  —  02  7^  0}, 
{04  —  02  =  0,010304  =  0}  as  well  as  {04  —  02  =  0,010304  /  0}  is  a 
Grobner  basis.  Further,  the  above  constraint  sets  constitute  a  complete 
cover  in  the  sense  every  specialization  satisfies  exactly  one  of  the  con¬ 
straint  sets.  For  each  constraint  set,  its  Grobner  basis  can  be  used  to 
analyze  solutions  of  the  original  parametric  system.  For  the  constraint 
set  {04  —  02  =:  0,010304  ^  0},  the  parametric  equations  are  incon¬ 
sistent;  the  dimension  of  the  associated  solution  set  is  -1  because  for 
such  specializations,  there  are  no  solutions.  If  a  specialization  satisfies 
{04  =  02,010304  =  0},  then  X4  =  0,  and  X3  can  be  obtained  by  solv¬ 
ing  94,  and  xi  and  X2  are  related  by  p2,  resulting  in  a  solution  set  of 
dimension  1.  For  specializations  in  which  04  7^  02,  the  solution  set  is 
zero-dimensional  since  for  every  variable,  there  is  a  polynomial  in  the 
Grobner  basis  whose  leading  term  is  a  powerproduct  in  that  variable 
alone:  X4  =  04  —  02,  values  of  X3  are  given  by  97,  from  which  the  corre- 
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spending  values  of  and  xi  can  be  obtained  using  54  and  p2- 

This  result  is  different  from  the  comprehensive  Grobner  basis  given  in 
[18]  or  ascending  sets  given  in  [4].®  In  [18],  many  extra  and  complicated 
polynomials  appear  in  the  comprehensive  basis  given  as  the  result.  In 
[4],  there  are  too  many  cases  given  which  can  be  easily  combined;  in 
contrast  the  above  result  is  compact,  very  much  in  Sit’s  sense  [16]. 

Example  2:  Let  us  consider  another  example  from  [4]  to  contrast  the 
proposed  approach  with  Chou  and  Gao’s  approach.  Consider  the  fol¬ 
lowing  set  of  polynomials: 

{y^  -  rxy  +  ar^  +  2  -  1,  xy  +  -  1, 

in  which  r  is  a  parameter.  The  computations  are  performed  using  the 
lexicographic  term  ordering  defined  hy  y  >  x  >  z. 

Unambiguous  constrained  polynomials  generated  from  the  input  are: 

1  :  <  0,y^  -  xxy  +  x^  +  2  -  1  >,  2  :  <  0,xy+  2^  -  1  >, 

3  :  <  0,  y^  +  x^  +  2^  -  >  . 

Polynomial  1  can  be  simplified  using  3  and  2  in  the  classical  way 

(because  of  empty  constraint  sets)  to  give:  1'  :  <  0,  —  1  >, 

which  replaces  1.  The  5-polynomial  of  2  and  3  is  also  computed  in  the 
classical  way  to  produce:  4  :  <  0,  —  y  —  x^  —  z'^x  -\r  r^x  >  .  The 

S-polynomial  of  1’  and  4  gives  a  new  polynomial: 

5  :  <  0,  zy  ~  2i/  +  r^y  +  zx^  -  -f  z^x  —  zr^x  -  z^x  +  r^x  >  . 

The  5-polynomial  of  2  and  4  gives  a  new  polynomial: 

6  :  <  0,  +  z^x^  -  r^x^  -  z  +  2  -  >  . 

The  5-polynomial  of  3  and  4  reduces  to  0.  The  5-polynomial  of  2  and 

6  also  reduces  to  0.  Polynomial  5  simplifies  4  to:  4'  :  <  0,^4  >,  where 
q4  =  (r^  ~  4r2  -h  3)y  -  (z^  -  (r^  -  l)z  -f  -  l)x^  -f-  ((r^  -  l)z^  -  (r^  - 
2r^  -j.  l)2r  —  2  -h  2r^)x;  4’  replaces  4.  Since  4’  is  ambiguous,  equivalent 
unambiguous  polynomials  are:  7  :  <  “  4r^  S  ^  0},54  >,  and 

®  Because  of  space  limitations,  definitions  of  comprehensive  and  reduced  Grobner 
bases  as  given  in  [18]  and  ascending  sets  in  [4]  cannot  be  reproduced.  The  reader 
may  consult  the  original  papers  for  definitions  and  other  details. 
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8  :  <  {r^  _  4r2  -f  3  =  0},q4  >.  Using  polynomial  7,  polynomials  2,  3, 
and  5  reduce  to  0  under  the  constraint  set  {r^  —  4r^  -f  3  ^  0}. 

The  set  {U,  2, 3, 5, 6, 7, 8}  constitutes  a  parametric  Grobner  basis.  A 
compactified  Grobner  basis  consists  of  {!',  2, 3, 4',  5, 6}. 

For  any  specialization  satisfying  any  of  the  constraint  systems  {r^  — 
4r2  +  3  ^  0}  and  {r^  -  4r^  +  3  =  0},  the  equivalent  set  of  polynomials 
obtained  from  the  above  parametric  Grobner  basis  is  a  Grobner  basis. 
These  constraint  sets  obviously  constitute  a  complete  cover. 

Solution  sets  can  be  generated  from  the  above  parametric  Grobner 
basis.  The  dimension  of  the  solution  set  is  0  irrespective  of  the  value  of 
r.  Since  —  4r^  -f  3  =  (r^  —  l)(r^  —  3),  each  factor  can  be  considered  to 
lead  to  a  distinct  case.  For  —  1  =  0,  i.e.  r  =  1,  —1,  the  Grobner  basis 
is:  {!':  2:  xy+z^-l,  3:  4':  5:  zy- 

y -{■  zx^  -  x^  z^x  -  zx  -  z^x  X,  6  :  x^  z^x^  -  -  z^  +  1}.  By 

factoring  1’,  we  get  {z  =  0,  +  l  =  0,  =  0}  giving  one  set 

of  solutions,  and  {z  =  1,  x  =  0,  y  =  0}  as  another  solution.  Similarly,  for 

-  3  =  0,  the  following  solutions  are  obtained  from  its  Grobner  basis: 
{z  =  -l,x  =  0,t/^  =  2},  {z  -  ”l,x2  =  2,y  =  0};  there  are  additional 
solutions  for  which  the  z  component  satisfies  the  {z^  —  2z  +  2  =  0}. 
Similarly,  for  (r^  -  l)(r^  —  3)  0,  solutions  can  also  be  computed. 

The  ascending  sets  given  for  this  example  in  [4]  are  different  from  the 
above  parametric  Grobner  basis,  in  particular  they  are  far  too  many; 
many  of  these  ascending  sets  can  be  combined  and  described  using  a 
single  ascending  set.  The  above  description  is  much  more  compact  and 
comprehensive  much  in  the  spirit  of  [16,  17]. 

Example  3:  Here  is  a  slighly  complicated  example  picked  from  [18]. 
Consider  the  set  of  two  parametric  polynomials:  {/  =  vxy-k^ux^-^-x^  g  = 
+  x^},  in  which  «,  v  are  parameters  and  x,  y  are  variables.  From  /, 
the  unambiguous  constrained  polynomials  are:  1  :  <  {t;  7^  0},  /  >, 

2  :  <  {v  =  0,«  7^  0},/  >;  3  :  <  {t;  =  0,  u  =  0},  /  >.  From  y,  the 
unambiguous  constrained  polynomials  are:  4  :  <  {u  ^  0},y  >, 

5  :  <  {u  =  0},y  >  . 

The  5-polynomial  of  1  and  4  is:  <  {«  ^  0,  v  /  0},  —vx^  +  u^x^y  -f 
ux  >;  its  normal  form  is  <  {«  7^  0,  v  7^  0}, /i  >  where  -h 

v^)x^  H-  2u^x^  4-  ux.  This  constrained  polynomial  is  ambiguous;  the 
unconstrained  polynomials  equivalent  to  it  are:  6  :  <  {t;  ^  0,  u  /  0, 
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^  0},/i  >,  7  :  <  {r;  ^  0,t/  7^  +  =  0},/i  >  .  The  5-polynomial 

of  1  and  5  is:  <  {v  7^  0,  u  =  0},  ux^  -h  >,  which  reduces  to  0  using  5. 
Polynomial  5  reduces  by  polynomial  3  to  {<  {r  =  0,  u  =  0},  0  >,  <  {v 
OjU  =  0},x^  >}.  So  we  can  replace  5  by  5'  :  <  {v  ^  0,  u  =  >  . 

The  5-polynomial  of  1  and  6  is:  <  {u  ^  0,  v  ^  0,  ^  0},  + 

v^)ux^  -\-v'^)x^  -‘lu^vx'^y  —  uvxy  >,  which  simplifies  to  0  by  using 
constrained  polynomials  1  and  6.  Similarly,  the  5-polynomial  of  1  and 
7  also  simplifies  to  0. 

It  can  be  easily  verified  that  the  set  of  unambiguous  constrained  poly¬ 
nomials  {1,2, 3, 4, 5',  6, 7}  constitute  a  parameterized  Grobner  basis  be¬ 
cause  every  5-polynomial  reduces  to  0.  For  the  case  when  {«  ^  0,t;  ^ 
0,u^  -f  7^  0},  {1,4,6}  is  a  Grobner  basis.  Its  solution  set  is  zero¬ 
dimensional  since  the  Grobner  basis  includes  polynomials  whose  head 
terms  are  powers  of  x  and  y.  For  {u  ^  0,  v  ^  0,  w®  -h  =  0},  {1, 4, 7} 
is  a  Grobner  basis,  and  its  solution  set  is  also  zero-dimensional.  In  the 
case  when  {u  ^  0,t;  =  0},  {2,4}  is  a  Grobner  basis  and  its  solution  set 
is  also  zero-dimensional.  The  set  {1, 5'}  is  a  Grobner  basis  for  the  case 
{u  =  0,  t;  7^  0},  and  for  when  {u  =  0,  v  =  0},  {3}  is  a  Grobner  basis.  In 
both  cases,  the  solution  set  is  of  dimension  1  since  the  Grobner  bases  do 
not  include  polynomials  in  y. 

Constrained  polynomials  in  the  above  parametric  Grobner  basis  can 
be  compactified  easily.  For  example,  constrained  polynomials  6  and  7 
can  be  compactified  to  <  {u  7^  0,  t;  7^  0},  h  >;  constrained  polynomials 
1,  2,  3  can  be  compactified  to  unconstrained  polynomial  /;  similarly,  4 
and  5^  can  be  compactified  to  g.  Comp  act  ification  would  result  in 
{/.<?.<  {«  #  0,t)  #  0},h  >}  as  a  Grobner  basis.  The  reader  can 
check  that  for  different  constraints  on  parameters,  the  resulting  Grobner 
basis  from  the  compactified  parametric  Grobner  basis,  after  deleting 
redundant  elements,  is  the  same  as  above. 

The  above  parametric  Grobner  basis  of  {f,g}  is  different  from  a 
Grobner  system,  a  reduced  Grobner  system,  as  well  as  a  comprehensive 
Grobner  basis  given  in  [18].  This  suggests  that  even  though  there  may 
be  many  similarities  in  the  two  approaches,  they  are  different  as  they 
produce  different  answers.  In  particular,  Weispfenning's  algorithm  pro¬ 
duced  the  polynomial  2t;(w^  -f  v^)x^y  -f  2{u^  -f-  v^)x^  —  ux  in  a  Grobner 
system  as  well  as  a  comprehensive  Grobner  basis  which  is  not  in  the 
above  answer.  Further  there  are  many  additional  polynomials  in  the 
result  reported  in  [18]. 
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1.5  Characteristic  Set  Construction 

The  concept  of  a  characteristic  set  of  a  set  of  polynomials  was  introduced 
by  Ritt  in  [15],  and  has  been  popularized  by  Wu  because  of  its  success 
in  geometry  theorem  proving  as  well  as  solving  systems  of  polynomial 
equations  [19,  20].  In  [8],  we  developed  a  characterization  of  character¬ 
istic  sets  as  defined  by  Ritt,  which  is  different  from  Wu's  definition  of  a 
characteristic  set.  The  characteristic  set  algorithm  given  by  Wu  trans¬ 
forms  a  system  of  polynomial  equations  into  a  triangular  form  whose 
zero  set  is  “roughly  equivalent’’  to  the  zero  set  of  the  original  system. 
Below,  we  discuss  characteristic  sets  in  Wu’s  sense  as  the  readers  are 
likely  to  have  better  familiarity  with  them.  But  the  approach  discussed 
below  extends  to  Ritt’s  characteristic  sets  discussed  in  [8]7  We  extend 
characteristic  set  construction  to  constrained  polynomials.  We  assume 
the  reader  is  familiar  with  the  characteristic  set  method;  for  details,  the 
reader  may  consult  [15,  19,  10]. 

1.5.1  Definitions  for  recursive  representation  of  polynomials 

In  characteristic  set  computations,  recursive  representation  of  polyno¬ 
mials  is  used.  A  total  ordering  on  variables  is  assumed  to  be  given.  A 
polynomial  is  viewed  as  a  univariate  polynomial  in  the  highest  variable 
appearing  in  it.  Degree,  head  term,  head  monomial  and  initial  of  a  con¬ 
strained  polynomial  are  defined  using  recursive  representation  similar  to 
the  definitions  for  distributed  representation. 

Let  >  be  a  total  ordering  on  variables  xi  <  X2  <•  -  <  A 

constrained  polynomial  <  C,p  >  is  defined  to  be  of  potential  class  i  if 
and  only  if  (i)  for  every  variable  xj  >  the  coefficient  aj  of  a  term 
x^^  in  p  is  0  with  respect  to  C,  where  dj  >  0,  i.e.  <  C,  >=<  C,  0  >, 
and  (ii)  there  is  a  term  xf*,  where  di  >  0,  whose  coefficient  ai  in  p  is 
such  that  <C,ai>  is  not  equivalent  to  <  C,0  >.  The  coefficient  a,-  is 
a  polynomial  in  parameters  as  well  as  variables.  The  highest  such  term 

^  Ritt’s  definition  of  a  characteristic  set  of  a  set  E  of  polynomials  is  that  in  addition 
to  being  a  ch2Lin,  every  polynomial  in  the  ideal  generated  by  E  pseudodivides  to  0 
using  the  ch2iracteristic  set.  That  is  not  necessarily  the  case  for  a  characteristic  set 
as  defined  by  Wu’s  ^dgo^ithm.  .  i  i. 

The  major  difference  in  the  algorithm  discussed  in  [8]  for  const  me  ting  Flitt  s  char¬ 
acteristic  set  is  that  when  a  chain  is  used  for  pseudo  division,  it  is  ensured  that  the 
initial  of  a  polynomial  in  the  chain  is  invertible  with  respect  to  the  polynomials  lower 
in  the  chain.  This  may  result  in  splitting  of  the  system  of  polynomials  into  many 
subsystems. 
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xf*  is  called  the  potential  head  term  of  <  C,p  >,  and  di  its  potential 
degree;  the  coefficient  a,-  is  called  the  potential  initial  of  <  C,p  >. 

For  example,  consider  p  =  ux^y  +  v^x^y  vy  -^rux  discussed  earlier 
assuming  y  >  x.  In  recursive  representation,  p  is  viewed  as  a  univariate 
polynomial  in  y  and  is  written  as:  {{u-{’V^)x^  ^v)y-\-ux;  the  degree  ofp 
relative  to  +  ^  0}  is  1,  and  the  head  term  is  y.  The  head  coefficient 

of  p,  also  called  its  initialj  is  ((u  +  v^)x^  +  v).  The  degree  of  p  relative 
to  {t;  0}  is  also  1,  and  the  head  term  is  y;  the  head  coefficient  of  p 

remains  to  be  ((u  +  v^)x^  +  v)  since  ((u  +  v^)x^  +  u)  ^  0.  The  potential 
class  of  <  0,p  >  is  2;  its  potential  degree  is  1,  and  the  potential  initial 
is  ((u  -f  v^)x^  +  v). 

Given  a  constrained  polynomial  <C,p>  in  recursive  form,  an  equiv¬ 
alent  simplified  form  <Cyq>  can  be  defined  in  a  way  similar  to  the  one 
for  distributed  representation:  <  C,  g  >  is  obtained  from  <  Cjp  >  by 
deleting  terms  whose  coefficients  are  0  with  respect  to  C.  For  example, 
the  simplified  form  of  <  {u  + =  0,  v  0},p  >  is  <<  {«  +  =  0,  r 

0},  vy  -h  ux  >,  and  the  simplified  form  of  <  {u-\-  =  0,  t;  =  0},p  >  is 

<  {u  =  =  0,  t;  =  0},  0  >. 

1.5.2  Ambiguous  and  unambiguous  constrained  polynomials 

If  there  is  a  consistent  superset  D  of  C  such  that  <  D,  a*  >  is  equivalent 
to  <  0  >,  then  p  is  ambiguous.  Otherwise,  <  C,  o,-  >  cannot  be 

made  equivalent  to  0,  and  p  is  unambiguous  in  which  case,  i,  xf*  di  and 
Gi  are,  respectively,  the  class,  head  term,  degree,  and  initial  of  p  also. 
From  an  ambiguous  constrained  polynomial  <  C,  p  >  (implying  that  its 
initial  a,-  is  such  that  <  C,  a,*  >  is  ambiguous),  unambiguous  constrained 
polynomials  equivalent  to  it  can  be  generated.  Unlike  in  the  distributed 
representation  case,  we  cannot  require  that  a,-  ^  0  since  a,-  is  itself  a 
polynomial  in  parameters  and  variables.  In  fact,  it  is  necessary  to  keep 
checking  for  the  head  coefficient  until  a  constraint  expression  is  arrived 
at;  then  the  constraint  set  C  is  appropriately  extended  as  in  the  case  of 
distributed  representation.  In  general,  the  constraint  set  C  is  extended 
based  on  the  structure  of  a,-  to  make  a,-  unambiguous  with  respect  to  an 
extended  constraint  set  that  includes  C*.  This  process  can  be  repeated 
till  all  constrained  polynomials  thus  obtained  are  unambiguous.  Corre¬ 
sponding  to  an  ambiguous  constrained  polynomial,  there  is  a  finite  set 
of  unambiguous  constrained  polynomials  equivalent  to  it. 

For  example,  assuming  y  >  x,  consider  p  =  ((u  +  v^)x^  +  v)y  +  ux. 
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The  constrained  polynomial  <  0,p  >  is  ambiguous  because  its  initial, 
the  polynomial  {u  -h  v^)x^  -h  v,  is  ambiguous  with  respect  to  the  empty 
set  of  constraints.  However,  (u  -h  v^)x^  -f  v  is  nonzero  with  respect  to 
{u  +  v^  ^  0};  so  <  {u  +  7^  0},p  >  is  unambiguous,  its  degree  is  1, 

and  its  initial  is  {u  +  v^)x^  +  v.  Similarly,  the  initial  remains  nonzero 
with  respect  to  the  constraint  set  {u  +  =  0,  u  7^  0};  the  degree  of 

<  {u-\-v^  =  0,  V  7^  0},p  >  is  also  1,  and  the  initial  remains 

The  initial  (w  +  v^)x^  +  v  becomes  zero  with  respect  to  the  constraint 
set  {u  -h  =  0,  V  =  0},  so  the  degree  of  <  {u  +  =  0,  v  =  0},p  > 

drops  to  0,  and  the  potential  initial  is  ux,  which  is  zero  with  respect  to 
{u  +  =  0,1;  =  0}.  So  <  {u  +  =  0,v  =  0},p  >  is  equivalent  to  0. 

Compactification  of  unambiguous  constrained  polynomials  into  am¬ 
biguous  constrained  polynomials  can  be  defined  for  recursive  represen¬ 
tation  also, 

1.5,3  P  seudo  division 

There  are  three  important  concepts  in  characteristic  set  computations: 
(i)  a  polynomial  being  reduced  with  respect  to  another  polynomial,  (ii) 
the  reduction  of  a  polynomial  by  another  polynomial  by  pseudodivision 
[12],  and  (hi)  the  initial  of  a  polynomial  being  invertible  [8],  The  first 
two  must  be  extended  for  Wu’s  algorithm. 

A  constrained  polynomial  <  C,p  >  is  reduced  with  respect  to  an 
unambiguous  <  D^q  >  ii  and  only  if  (i)  the  constraint  set  C  \J  D  is 
consistent,  and  (ii)  the  class  of  <  £),  ^  >  is  i  and  the  potential  degree  of 
Xi  in  <  C  U  £),p  >  is  <  the  degree  of  x,-  in  <  D,  g  >. 

An  unambiguous  <  C^p>  reduces  {pseudodivides)  by  another  unam¬ 
biguous  <  D,g  >  to  <  CU D,r  >  if  (i)  Cud  is  consistent,  (ii)  <  C,p  > 
is  not  reduced  with  respect  to  <  D,g  >,  and  (hi)  there  exist  a  number 
k  (preferably  the  smallest  such  number,  but  is  <  (d-di)  + 1,  where  i  is 
the  class  of  <  D,  g  >,  d,-  is  the  degree  of  <  D,  g  >,  and  d  is  the  degree  of 
Xj  in  <  C,p  >),  a  polynomials  b  (preferably  not  involving  parameters) 
and  another  polynomial  r  such  that 

<  CuD,//p>=<  CUD,6g  >  +  <  CuD,r  >, 

where  Ig  is  the  initial  of  <  C  U  D,g  >,  and  the  potential  degree  of  Xj 
in  r  is  <  dj.  This  calculation  can  be  done  in  the  classical  way  by  first 
simplifying p  and  g  with  respect  to  CUD.  Note  that  <  CUD,r  >  need 
not  be  unambiguous. 


24 


Chapter  1 


It  is  easy  to  see  that  for  every  specialization  cr  satisfying  CU  D,  the 
common  zeros  of  (t{p)  and  (T{q)  are  also  the  zeros  of  the  remainder  tr(r); 
further,  if  r  =  0,  then  the  common  zeros  of  <t{p)  and  (r{q)  are  the  same 
as  the  zeros  of  ^{q)  insofar  as  they  are  not  the  zeros  of  <T{Iq),  And,  G{r) 
is  in  the  ideal  generated  by  <t{p)  and  ^{q).  Unlike  in  the  case  of  Grobner 
basis  computations,  <  C  U  D^r  >  does  not  replace  any  of  <  C, p  >  and 
<  P,  g  >;  instead,  it  gets  added  to  the  basis. 

In  order  for  the  definitions  to  extend  naturally,  we  define  that  an 
unambiguous  constrained  polynomial  <  C,p  >  reduces  by  another  un¬ 
ambiguous  <  D,q  >  to  0  if  CUD  is  inconsistent.  Thus  an  unambiguous 
constrained  polynomial  <  C,p  >  is  noi  reduced  with  respect  to  another 
unambiguous  <  D,q>  if  their  constraint  sets  clash. 

Definition  3  A  set  {<  C'i,pi  >,•■•,  <  C;,p/  >}  of  unambiguous  con¬ 
strained  polynomials  is  called  a  chain  if  (i)  /  =  1  and  <  Ci,pi  >  is  not 
equivalent  to  0,  or  (ii)  0  <  h  <  h  <  ’  -  <  U,  where  ij  is  the  the  class 
of  <  Cj,Pj  >,  and  JD  =  Ci  U  •  •  •  U  C/  is  consistent,  and  <  Cj,pj  >  is 
reduced  with  respect  to  each  <  Ci,p,-  >,  i 

A  chain  is  also  called  a  triangular  form;  each  element  in  the  chain  is 
said  to  introduce  a  new  variable,  i.e.  <  Cj^pj  >  introduces  .  Also 
notice  that  the  chain  is  relevant  for  only  those  parameter  values  that 
satisfy  the  constraint  set  D.  The  above  definition  can  be  weakened  to 
define  a  chain  in  the  loose  sense  or  a  chain  in  the  weak  sense  as  defined 
by  Wu  for  polynomials,  in  which  only  the  initial  of  <  Cj ,  pj  >  is  reduced 
with  respect  to  <  Ci,Pi  >^s  [19]. 

Pseudodivision  of  a  constrained  polynomial  can  be  defined  with  re¬ 
spect  to  a  chain  in  the  obvious  way.  The  constrained  polynomial  is 
pseudodivided  by  the  polynomial  of  the  largest  class  in  the  chain,  then 
the  result  is  pseudodivided  by  the  next  polynomial  lower  in  the  chain  and 
so  on,  until  the  result  that  is  reduced  with  respect  to  every  polynomial 
in  the  chain  is  generated. 

1,5.4  Computing  Parametric  Characteristic  Set 

A  parametric  characteristic  set  should  cover  all  values  of  parameters 
satisfying  parametric  constraints  if  given  as  a  part  of  the  input.  A  para¬ 
metric  characteristic  set  may  thus  include  many  chains,  but  they  are 
disjoint  in  covering  different  parameter  values.  A  parametric  character¬ 
istic  set  is  defined  based  on  the  definition  of  a  characteristic  set. 
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Definition  4  A  finite  set  CCP  of  unambiguous  constrained  polynomi¬ 
als  is  a  parametric  characteristic  set  if  (i)  for  any  two  nonequivalent 
<C,p>  and  <  D,q>  of  class  i  in  CCP,  CUD  is  inconsistent,  and  (ii) 
for  every  specialization  a  of  parameters,  (t{CCP)  is  equivalent  to  a  set 
CP  of  polynomials  that  constitutes  a  characteristic  set. 

Definition  5  A  finite  set  CCP  of  unambiguous  constrained  polynomi¬ 
als  is  a  parametric  characteristic  set  of  a  finite  set  BC P  of  parametric 
polynomials  if  and  only  if  (i)  CCP  is  a  parametric  characteristic  set, 
and  (ii)  for  every  specialization  cr,  (t(CCP)  is  equivalent  to  a  set  CP  of 
polynomials  that  constitutes  a  characteristic  set  of  the  set  of  polynomials 
equivalent  to  a{BCP). 

A  parametric  characteristic  set  can  be  computed  using  a  method  sim¬ 
ilar  to  a  method  used  to  compute  a  characteristic  set  of  polynomials 
as  discussed  in  [19,  20,  3,  11].  A  simple  algorithm  for  computing  a 
parametric  characteristic  set  works  as  follows.  Firstly,  for  every  ambigu¬ 
ous  polynomial,  generate  an  equivalent  set  of  unambiguous  polynomials. 
Secondly,  identify  a  minimal  chain  of  constrained  polynomials  in  the  ba¬ 
sis:  From  a  given  set  BCP  of  constrained  polynomials,  let  S  =  BCP, 

5  =  0,CP  =  0. 

1 .  Pick  a  minimal  element,  say  <  C,  p  >,  in  5,  i.e.  a  constrained  polynomial 
of  the  smallest  class,  say  i,  and  of  the  smallest  degree,  say  d,- ,  in  its  largest 
variable  Xi,  and  include  it  in  P.  Let  CB  =  CBU  C. 

2.  Remove  from  S  any  constrained  polynomial  <  D,q  >  such  that  the 
degree  of  x,-  in  <  D,  g  >  is  >  di  or  either  CB  U  D  is  inconsistent. 

3.  Repeat  the  above  two  steps  until  5  is  empty. 

It  can  be  easily  shown  that  B  C  BCP  is  a  chain.  Polynomials  in  BCP 
are  pseudodivided  by  P.  Let  T  be  the  nonzero  remainders  obtained. 
If  T  is  0,  then  we  have  identified  a  chain  belonging  to  a  parametric 
characteristic  set.  Otherwise  BCP  is  augmented  to  include  T,  and  the 
above  procedure  of  identifying  a  chain  in  BCP  is  repeated.  This  process 
terminates  since  T  includes  polynomials  of  lower  degrees  than  those  in 
P. 

If  a  chain  P  in  BCP  is  identified  such  that  every  constrained  poly¬ 
nomial  in  BCP  has  a  clashing  constraint  set  with  P  or  pseudodivides 
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to  0  using  B,  we  extract  that  chain  out  of  BCP  and  compute  the  re¬ 
maining  chains.  .B  is  a  chain  in  a  parametric  characteristic  set  of  BCP] 
the  constraint  set  CB  associated  with  B  represents  parameter  values 
for  which  B  is  a  characteristic  set.  For  every  specialization  a  satisfying 
CB,  every  polynomial  in  the  set  S  of  polynomials  equivalent  to  a{BCP) 
pseudodivides  to  0  using  B',  the  set  of  polynomials  equivalent  to  a{B). 

One  simple  (but  naive)  way  to  extract  out  B  from  BCP  is  to  define 

BCP'  -  {<  D,p>\<CjP>£  BCP,  D  =  CU  {neg{cb)}  is  consistent, 
cb  e  CB,<  D,p  >^<  B,0  >} 

It  is  easy  to  see  that  for  every  specialization  a  satisfying  CB,  o-(BCB') 
is  the  empty  set,  and  further,  for  every  specialization  a  not  satisfying 
CB,  <t{BCP')  =  (t(BCB).® 

If  BCP^  is  not  the  empty  set,  assign  BCP'  to  S  and  repeat  the  above 
steps  of  identifying  and  extracting  out  additional  chains  of  a  paramet¬ 
ric  characteristic  set.  The  set  of  all  chains  collected  in  this  way  is  a 
parametric  characteristic  set. 

Many  variations  of  the  above  simple  algorithm  are  possible  and  many 
heuristics  can  be  incorporated.  Many  examples,  including  those  in  [18, 
4],  have  been  successfully  worked  out  using  the  above  algorithm. 

1.5.5  Example 

Let  us  consider  Example  3  of  the  previous  section  used  for  illustrat¬ 
ing  Grobner  basis  computations,  and  apply  the  characteristic  set  con¬ 
struction.  The  input  is  {/  =  vxy  -h  ux^  x,  g  =  uy^  -f-  x’^}  with 
y  >  X,  Unambiguous  constrained  polynomials  equivalent  to  /  are: 
1:  <{vi^0}J>,  2  :  <  {t;  =  0,u^0},/>,3  :  <  {t;  =  0,u  =  0},/>. 
Unambiguous  constrained  polynomial  equivalent  to  g  are: 

4:  <  >,  5:<  {ti  =  0},5f>. 

Applying  the  above  algorithm,  we  get  {3}  as  a  chain  of  a  parametric 
characteristic  set  for  the  case  when  {u  =  0,  v  =  0}.  This  chain  is  ex¬ 
tracted  giving  a  replacement  of  5  by  5'  :  <  {u  —  ^  Qi],g  >  .  From 

{1, 2, 4, 5'},  another  chain  {5',  1}  for  the  case  when  {u  =  0,  i;  0}  is  gen¬ 
erated.  After  extracting  this  chain,  we  have:  {!',  2, 4},  where  1'  :  <  {w  9^ 

*  As  in  the  case  of  Grobner  basis  construction,  a  compact  representation  of  BCPf 
can  be  constructed  using  constraints  which  are  propositionaJ  formulas  over  basic 
constraints. 


An  Approach  for  Solving  Systems  of  Parametric  Polynomial  Equations  27 


0,1^  7^  0},/  >  .  There  is  another  chain  {2,4}  for  ^  0,t;  =  0}.  After 
extracting  this  chain,  we  get:  {l',4'},  where  4'  :  <  {u  ^  0,  t)  9^  0},flr  >  . 

Polynomial  1’  is  used  to  pseudodivide  4’  and  the  result  is:  <  {u  ^ 
0,u  0},r  >,  where  r  =  («^  +  v'^)x^  +  +  ua;^.  Since  the  result 

is  ambiguous,  we  get  additional  unambiguous  polynomials  equivalent 
to  it:  6  :  <  {u  9^  0,  K  #  0,  #  0},  r  >,  and  7  ;  <  {v  0,  w  7^ 

0,  =  0},  r  >.  There  are  two  more  chains:  {!',  6}  for  the  constraint 

set  {v  :^0,u:^0,u^  +  v^  ^  0},  and  {l',7}  for  the  constraint  set  {u  ^ 
0,  u  91^:  0,  u®  +  =  0}. 

A  parametric  characteristic  set  consists  of: 

3  :  <  {u  =  0,u  =  0},/  >,  5'  :  <  {u  =  0,t;  91^  0},flf  >, 

2  :  <  {t)  =  0,u  91^;  0},/  >,  4"  :  <  {u  91^  0,u  =  0},</  >, 

1"  :  <  {u  =  0,U9«t0},/>,  1"'  :  <  {u  7^  0,v  7^  0,w^  + 9^  0},/ >, 

1""  :  <  {u  9^  0, «  #  0,  =  0),  /  >, 

6  :<  {v  7^  0, «  7^  0,  w®  +  ^  0},  r  >, 

7  :  <  {v  ^  0,  u  ^  0,  =  0}, r  >  . 

In  [20] ,  Wu  gave  a  structure  theorem  and  showed  how  it  could  be  used 
for  computing  zeros  of  polynomial  equations  from  their  characteristic 
sets.  This  construction  can  be  extended  to  constrained  polynomials. 
Below  we  illustrate  how  solutions  can  be  obtained  from  some  of  the 
chains  in  the  above  parametric  characteristic  set. 

The  chain  {l'",6}  can  be  used  to  generate  the  zeros  as  follows:  For 
1"',  the  initial  vx  should  be  nonzero  which  requires  that  a:  7^  0;  this 
condition  can  be  used  to  simplify  {1”^6},  to  give  a  new  set  {P”,  6^}, 
where  6'  :  <  {?^  7^  0, «  7^  0,w^  +  t)2  ^  Q},{u^-irv'^)x'^  +  2v?x-\-u  >  .  From 
constrained  polynomials  Y"  and  6',  zeros  can  be  computed  by  solving 
6’  and  then  substituting  the  values  of  x  to  get  the  corresponding  values 
of  y. 

It  is  also  necessary  to  consider  the  case  when  the  initial  x  =  0.  It  is 
not  enough  to  add  the  polynomial  x  =  0  to  the  chain  {l'",6};  instead 
this  condition  of  an  initial  not  being  zero  needs  to  be  added  to  the 
whole  basis  from  which  the  chain  was  generated.  Alternately,  x  =  0  can 
be  added  to  the  original  set  and  a  parametric  characteristic  set  can  be 
computed  for  ^  0,  u  9^  0,  ^  0}. 

When  X  =  0  is  added  to  the  original  basis,  /  becomes  0,  and  g  becomes 
uy"^.  We  thus  get  a  parametric  characteristic  set  consisting  <  {u  7^ 
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0},  u'lp'  >,  <  0,  a:  >,  which  gives  the  zero  x  =  0,  i/  =  0  for  the  case  when 
w  ^  0.  For  the  case  when  «  =  0,  we  get  a  1-dimensional  zero  set,  which 
is  X  =  0. 

From  the  chain  {I'",  7},  we  get  T  under  the  condition  that  x  ^  0: 
<  {u  0,  V  ^  0,  =  0},  2u^x+u  >,  which  gives  a  zero-dimensional 

zero  set  whose  x  component  is  —  Zero  sets  can  be  computed  from 
other  chains  in  a  similar  manner.  Chain  {5}  for  the  case  u  =  0,^  =  0, 
also  gives  a  1-dimensional  zero  set,  which  is  x  =  0. 

For  computing  zero  sets,  optimizations  such  that  factoring  and  split¬ 
ting  can  be  performed  to  speedup  the  computations  of  chains  insofar  as 
the  zero  sets  are  preserved. 

1.6  Concluding  remarks  and  future  research 

A  simple  but  powerful  approach  for  solving  systems  of  parametric  poly¬ 
nomial  equations  has  been  discussed.  The  approach  was  applied  to  de¬ 
fine  and  illustrate  parametric  Grobner  basis  construction  from  a  sys¬ 
tem  of  parametric  polynomial  equations.  As  was  evident  from  examples 
discussed  above,  different  values  for  parameters  may  result  in  different 
systems  and  associated  solution  sets,  including  cases  when  for  some  pa¬ 
rameter  values,  system  may  be  inconsistent  and  thus  have  no  solution,  as 
well  as  for  other  parameter  values,  the  same  system  may  have  infinitely 
many  solutions.  The  approach  was  also  briefly  illustrated  for  charac¬ 
teristic  set  constructions  and  generating  solutions  using  characteristic 
sets.  The  same  approach  should  extend  to  other  symbolic  elimination 
techniques  including  univariate  resultants  and  multivariate  resultants. 

From  the  discussion,  the  reader  must  have  noticed  that  sometimes 
unnecessary  and  excessive  branching  may  be  done  because  it  is  guided 
by  considering  the  coefficient  of  the  leading  term  in  a  parametric  poly¬ 
nomial.  Further  research  is  necessary  to  study  how  branching  can  be 
avoided.  In  [16,  17],  some  ideas  are  discussed  in  the  context  of  solving 
linear  systems  of  parametric  equations,  which  may  be  useful  for  nonlin¬ 
ear  polynomial  equations  also. 

Given  that  symbolic  computation  algorithms  do  not  often  finish  on 
many  large  examples  (often  because  of  lack  of  space  since  large  inter¬ 
mediate  computations  may  be  generated),  numerical  algorithms  such 
as  those  based  on  homotopy  and  continuation  techniques  must  be  em- 
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ployed.  It  would  be  interesting  to  study  how  this  approach  can  be  used 
for  solving  systems  of  parametric  polynomial  equations  using  numerical 
methods. 

Acknowledgements:  Comments  by  Lakshman  Y.N.  and  a  referee  on 
an  earlier  draft  helped  in  improving  the  presentation  of  the  paper. 
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when  do  /  and  g  have  common  roots?  The  question  leads  naturally  to  a  condition  that  0.  Unlike  in  Sylvester’s  formulation,  where  the  resultant  of  /  and  g  is  the  determinant  of 

has  to  be  satisfied  by  the  coefficients  of  /  and  g.  This  condition  was  discovered  by  Euler  an  (m  +  n)  x  (m  +  n)  matrix,  in  the  Cayley-Dixon  formulation,  the  resultant  is  obtained 

and  is  now  commonly  referred  to  as  the  vanishing  of  the  Sylvester  resultant  of  f  and  g.  ag  the  determinant  of  a  n  x  n  matrix. 

The  Sylvester  resultant  of  /,  g  is  the  determinant  of  the  following  matrix;  example,  consider  a  generic  cubic  polynomial: 
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Given  a  basis  F  for  an  idea! I  and  an  admissible  term  ordering  -<,  the  algorithm  returns  where  G'  =  G  \  i.e.  each  polynomial  in  the  basis  G  is  reduced  wHh  respect  to  all  the 

a  Grobner  basis  for  J  for  the  term  ordering  -<.  other  polynomials  in  G. 
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REASONING  ABOUT  NUMBERS  IN  TEUTON 


“  Preliminary  Version  — 
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State  University  of  New  York  at  Albany 
Albany,  New  York  12222,  U.S.A. 


Abstract 

We  discuss  algorithms  and  heuristics  for  reasoning  about  numbers  -  rationals,  integers,  and 
naturals,  and  the  associated  operations  +,  =,  >,  in  Tecton,  a  specification  and  verification 
system  currently  under  development.  A  main  objective  is  to  make  Tecton  more  efficient  for 
tasks  of  reasoning  about  specification  and  programs  that  involve  numbers.  We  extend  Fourier’s 
algorithm  for  deciding  satisfiability  of  linear  constraints  over  the  rationals  to  linear  constraints 
over  the  integers.  This  algorithm  serves  as  a  building  block  for  a  complete  decision  procedure  for 
universally  quantified  Presburger  arithmetic,  in  which  a  special  emphasis  is  placed  on  deducing 
equalities  and  using  them  as  rewrite  rules  for  simplification  and  elimination.  We  discuss  how  this 
decision  procedure  has  been  integrated  with  definitions  and  properties  of  interpreted  function 
symbols  specified  as  terminating  rewrite  rules  in  the  Tecton  system. 


1  Introduction 

In  the  summer  of  1991,  we  implemented  a  set  of  algorithms  and  heuristics  to  automatically  reason 
about  numbers  -  rationals,  naturals  and  integers,  into  a  verification  and  specification  system  Tecton 
[11]  which  is  being  developed  on  top  of  our  theorem  prover  Rewrite  Rule  Laboratory  (RRL)  [15].  This 
paper  is  a  belated  report  on  these  algorithms  and  our  approach  for  implementing  these  algorithms 
into  a  rewrite  rule  based  theorem  prover.  In  particular,  we  extend  Fourier’s  method  for  linear 
inequalities  over  the  rationals,  as  explained  in  [20],  to  consider  linear  inequalities  over  the  integers, 
and  show  its  completeness  as  a  decision  procedure  for  satisfiability  problem  of  linear  inequalities 
over  the  integers.^  The  extended  algorithm  is  used  as  a  building  block  for  a  decision  procedure  for 
universally  quantified  Presburger  arithmetic  with  uninterpreted  symbols.  A  particular  emphasis  is 
placed  on  deducing  equalities  so  that  they  can  be  used  as  rewrite  rules  for  simplification  and  for 
detecting  unsatisfiability. 

We  discuss  how  the  decision  procedure  for  Presburger  arithmetic  has  been  integrated  with  ter¬ 
minating  (conditional)  rewrite  rules  defining  interpreted  function  symbols  and  their  inductive  prop¬ 
erties.  In  particular,  we  discuss  the  interaction  between  the  decision  procedure  and  conditional 

♦partially  supported  by  National  Science  Foundation  Grant  No.  CCR— 9303394  and  United  States  Air  Force  Office 
of  Scientific  Research  Grant  No.  AFOSR-91-0361. 

^Current  address:  Department  of  Computer  Science,  The  Wichita  State  University,  Wichita,  Kansas  67208. 

^  After  writing  an  earlier  draft  of  this  paper,  we  recently  learned  (May  1994)  that  Williams  had  already  reported 
such  an  extension  in  [28].  A  careful  study  of  Williams’s  paper  and  Cooper’s  paper  [4]  would  reveal  that  Williams 
was  rediscovering  Cooper’s  results  and  Presburger’s  procedure  as  reported  in  [6].  This  is  not  very  surprising  because 
Fomier’s  method  has  been  rediscovered  many  times  in  the  mathematics  literature. 
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rewriting  while  normalizing  a  formula  to  prove  its  validity  in  the  Tecton  system.  Many  examples 
are  given  to  illustrate  different  decision  procedures  and  their  combinations. 

A  main  objective  in  designing  the  Tecton  system  is  to  support  construction  of  large  complex 
proofs,  such  as  those  that  typically  are  necessary  in  reasoning  about  specification  and  descriptions 
of  generic  components  useful  in  software  and  hardware  design  [11],  Often,  one  needs  to  reason  about 
natural  numbers  and  integers,  particularly  in  the  context  of  linear  data  structures  such  as  arrays  and 
sequences,  where  indices  are  needed,  and  totally-ordered  enumerative  sets  such  as  natural  numbers 
and  integers  are  natural  choices  for  indices.  Reasoning  about  numbers  is  also  useful  in  reasoning 
about  measures  including  size,  length,  depth,  etc,  defined  on  data  structures.  Using  Presburger 
arithmetic,  we  have  recently  developed  a  method  for  checking  completeness  of  function  definitions 
defined  on  numbers  using  0,s, -l-,>  as  terminating  rewrite  rules  [10],  Using  this  method,  it  is 
possible  to  prove  completeness  of  the  function  definitions  used  in  an  inductive  proof  of  unique 
prime  factorization  theorem  done  on  RRL  using  the  cover  set  method  for  automating  induction 
[31];  this  method  can  also  be  used  to  prove  the  completeness  of  cover  sets.  In  another  paper,  we 
have  developed  methods  using  Presburger  arithmetic  for  generating  induction  schemes  and  other 
heuristics  for  enhancing  the  inductive  theorem  proving  capabilities  of  RRL  and  Tecton  [14], 

Algorithms  for  deciding  properties  of  numbers  were  extensively  discussed  in  the  literature  in  the 
mid-70's  and  early  80's  when  there  was  considerable  interest  in  program  verification.  A  particular 
focus  has  been  on  a  decision  procedure  for  a  subclass  of  Presburger  arithmetic  (in  particular,  uni¬ 
versally  quantified  theory  of  Presburger  arithmetic).  Cooper  [4]  gave  an  algorithm  which  improved 
upon  on  his  previous  algorithms  as  well  as  an  algorithm  given  in  logic  text-books,  e.g.  [6].  Shostak 
[25]  proposed  a  method  for  proving  formulas  in  Presburger  arithmetic  using  a  procedure  for  solving 
linear  inequalities  over  the  rationals  using  the  sup- inf  method  proposed  by  Woody  Bledsoe  [1,  2], 
Subsequently,  Shostak  [26]  gave  an  algorithm  for  Presburger  arithmetic  with  uninterpreted  func¬ 
tion  symbols,  formulas  handled  by  Cooper’s  algorithm;  this  algorithm  was  based  on  integer  linear 
programming.  Nelson  and  Oppen  [23]  proposed  an  elegant  framework  of  cooperating  decision  pro¬ 
cedures  with  a  simplex  based  algorithm  for  solving  linear  inequalities  over  the  rationals  as  one  of 
the  decision  procedures  in  the  Stanford  Pascal  Verifier.  This  framework  was  adopted  in  the  Eves 
system  and  has  been  implemented  in  its  theorem  prover  [5]. 

In  the  mid-80’s,  Boyer  and  Moore  [3]  incorporated  a  rational-based  procedure  in  their  theorem 
prover  for  automating  proofs  by  induction,^  They  extensively  discussed  their  implementation,  in 
particular  representations  used  for  building  a  data  base  for  rewriting  using  contexts,  as  well  as 
specific  techniques  used  to  manage  the  interaction  between  the  rewriter  and  the  decision  procedure. 

Lassez  and  his  group  have  been  investigating  efficient  methods  for  linear  constraints  over  the 
rationals  in  the  context  of  constraint  logic  programming  [19,  20].  In  particular,  they  have  revived 
interest  in  Fourier’s  algorithm  for  checking  the  satisfiability  of  linear  constraints  over  the  reals  and 
the  rationals.  In  [20],  Lassez  and  Maher  discussed  Fourier’s  algorithm  in  detail  and  showed  how 
it  could  be  used  to  deduce  implicit  equalities  in  inequality  constraints.  The  discussion  of  Fourier’s 
algorithm  below  is  based  on  [20]. 

We  have  recently  learned  of  another  interesting  application  of  reasoning  methods  over  numbers 
in  the  area  of  data  dependency  analysis  for  compilers  for  supercomputers.  Presburger  arithmetic 
has  been  used  to  study  dependency  among  reads  and  writes  of  array  elements  in  loop  programs.  For 
more  details,  the  reader  can  consult  [24]. 


2  Algorithms  for  Reasoning  about  Linear  Inequalities 

The  main  algorithm  for  solving  linear  inequalities  that  we  implemented  is  that  of  Fourier  [20],  which 
as  Boyer  and  Moore  remarked  [3],  “is  just  a  formalization  of  the  high  school  idea  of  “cross  multiplying 
and  adding”  equalities  to  eliminate  variables.”  In  this  section,  we  first  review  Fourier’s  algorithm 

2  We  believe  that  Hodes’  method  [7]  implemented  in  Boyer  jmd  Moore’s  prover  is  essentially  Fourier’s  algorithm 
(perhaps  with  minor  variations). 
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for  checking  satisfiability  of  linear  inequalities  over  the  ration als,  and  then  describe  how  Fourier’s 
algorithm  can  be  extended  to  derive  implicit  equalities  [20].  Later,  we  discuss  our  extension  of 
Fourier’s  algorithm  to  linear  integer  inequalities.  This  extension  serves  as  a  basis  for  a  decision 
procedure  for  universally  quantified  Presburger  arithmetic. 

Henceforth,  by  Presburger  arithmetic,  we  mean  the  universally  quantified  theory  built  using 
natural  number  constants  (or  integer  constants  whenever  that  is  evident  from  the  context),  variables 
over  the  natural  numbers  (or  integers),  addition  (and  subtraction),  the  usual  arithmetical  relations 
(such  as  =,>,>}<}<))  and  the  usual  first  order  logical  connectives;  no  other  symbols  are  assumed. 
By  a  Presburger  formula,  we  mean  a  quantifier-free  formula  in  Presburger  arithmetic.  Later  we 
consider  extended  Presburger  arithmetic  in  which  function  symbols  are  introduced;  if  a  function 
symbol  is  not  constrained  using  any  additional  properties,  it  is  considered  to  be  uninierpreied; 
otherwise  it  is  considered  to  be  interpreted.  Following  [20],  we  now  introduce  some  terminology. 

Definition  2.1  An  (integral)  inequality  is  an  expression  of  the  form  UjXj  <  b  where  aj  and  b 

are  integers.  Let  C  be  an  inequality  ^1^=1  denote  the  equation  Ylj=i  ^ 

For  a  >  0,  aC  is  the  inequality  obtained  by  multiplying  C  by  a,  i.e.  ^  ^he 

sum  Cl  4-  C2  of  inequalities  Ci  (Ei=i  ^'2  <  h)  is  the  inequality 

^  (^1  +^2)*  If  an  inequality  H  can  be  expressed  as  H  ^hCk  and  >  0 

(1  <  ib  <  m),  then  H  is  called  as  a  non-negative  linear  combination  of  inequalities  Ci,  ...,  <7^-  When 
each  Ofc  is  strictly  positive,  then  ^  is  a  positive  linear  combination. 

We  define  in  the  usual  way  the  concepts  of  satisfiability  of  formulas  over  a  domain  and  of  logical 
consequence  (^)-  We  define  an  implicit  equality  in  a  domain  in  a  set  P  of  inequalities  to  be  an 
equation  C~  where  C  is  an  inequality  in  V  and  V  (=  C~  over  the  domain.  A  domain  could  be 
rationals,  integers  or  naturals.  Often,  the  domain  would  be  evident  from  the  context. 

2.1  Fourier’s  algorithm  for  rationals 

Let  'P  be  a  set  of  inequalities,  and  let  x  be  a  variable  appearing  in  it.  Let  Ci  be  an  inequality  in  V 
with  a  negative  coefficient  “-c*  of  a?,  Ci  >  0,  and  Cj  be  an  inequality  in  P  with  a  positive  coefficient 
Cj  of  X,  Cj  >  0.  Then  CiCj  +  cjCi  is  an  inequality  which  does  not  contain  x.  Such  an  operation  is 
called  elimination,  since  it  eliminates  an  occurrence  of  the  variable  x  from  Ci,Cj.  Every  common 
solution  of  Ci,Cj  is  also  a  solution  of  CiCj  +  CjCp,  further,  every  solution  of  aCj  -f  CjCi  can  be 
extended  to  get  a  common  solution  of  Ci,Cj.  Because  of  such  relationship  among  the  solution  of 
the  derived  inequality  to  Ci,Cj,  i.e.  a  common  solution  of  Ci,  Cj  can  be  projected  by  forgetting  the 
value  of  X  to  get  a  solution  of  CiCj  -1-  CjCi,  this  operation  is  also  called  projection. 

Given  V,  we  can  derive  from  V,  all  inequalities  that  do  not  have  any  occurrence  of  x  by  pairing 
inequalities  containing  x  with  negative  coefficients  with  inequalities  containing  x  with  positive  coef¬ 
ficients.  The  set  of  all  such  inequalities,  together  with  the  inequalities  of  P  which  do  not  contain  x, 
is  the  result  of  a  macro  Fourier  step  eliminating  x  completely  from  P .  Typically  many  redundant 
inequalities  are  generated  which  can  be  detected  and  deleted  using  the  test  that  C  <  c  implies 
aC  <  d  for  any  a  >  0,  ac  <  d. 

If  X  does  not  have  a  positive  coefficient  in  any  inequality  of  P  or  a?  does  not  have  a  negative 
coefficient  in  any  inequality  of  P,  then  the  Fourier  step  eliminating  x  from  P  simply  deletes  all 
inequalities  containing  x.  This  is  because  an  appropriate  value  of  x  can  be  constructed  to  satisfy  all 
such  inequalities.  Such  a  Fourier  step  is  called  trivial. 

Fourier’s  algorithm  consists  of  repeatedly  performing  macro  Fourier  steps,  eliminating  one  vari¬ 
able  at  a  time,  until  either  the  set  contains  a  contradictory  inequality  0  <  c  where  c  is  negative, 
implying  that  the  original  set  of  inequalities  is  unsatisfiable,  or  all  variables  are  eliminated  without 
yielding  any  contradictory  inequality,  thus  implying  that  the  original  set  of  inequalities  is  satisfiable. 

It  is  easy  to  see  that  any  inequality  C  produced  in  Fourier’s  algorithm  is  a  non-negative  linear 
combination  of  inequalities  Ci,...,Cm  E  P.  The  following  theorems  from  Kaufl  ([16])  and  Lassez 
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and  Maher  ([20])  serve  as  a  basis  for  deriving  implicit  equalities  from  a  set  V  of  inequalities. 
Theorem  2.2  IfV  implies  an  equality  e,  there  is  a  finite  subset  {Ci,  ..,jCk}  ofV  such  that  V  \=  e 

Theorem  2.3  If  YlT=i  =  0  <  0,  C*  €  P,  o^jb  >  0, 1  <  Ar  <  m,  then  each  Cy ,  1  <  j  <  m,  is  an 
implicit  equality  ([20]). 

Theorem  2.4  An  inequality  Ck  in  a  set  of  consistent  inequalities  V  is  an  implicit  equality  in  ratio- 
nals  iff  Fourier ^s  algorithm  produces  an  inequality  0  <  0  using  Ck  ([20]). 


2.2  Extending  Fourier’s  algorithm  to  integers 

Fourier ^s  algorithm  can  be  extended  to  form  a  complete  decision  procedure  for  checking  satisfiability 
of  linear  inequalities  over  the  integers.  It  is  easy  to  see  that  if  a  set  of  inequalities  is  unsatisfiable 
over  the  rationale,  it  is  unsatisfiable  over  integers.  It  is  however  possible  for  a  set  of  inequalities 
to  be  unsatisfiable  over  the  integers,  but  it  may  have  a  satisfying  assignment  over  the  rationale.  A 
simple  example  is  that  of  {2x  <  1,  —2a?  <  —1}. 

In  the  extended  Fourier’s  algorithm  the  basic  Fourier  step  is  the  same,  i.e.  given  an  inequality 
Ci  in  V  with  a  negative  integral  coefficient  —  Cj  of  a?,  Cj  >  0,  and  Cj,  another  inequality  in  'P,  with 
a  positive  integral  coefficient  Cj  of  a?,  cj  >  0,  diCj  -f  djCi  is  an  inequality  wich  does  not  contain  a?, 
where  di  =  It  would  have  been  okay  to  multiply  Cj  by  c*  and  Ci  by  Cj, 

but  one  is  likely  to  see  smaller  integers  if  we  remove  the  gcd  of  Ci^Cj  from  the  multipliers. 

If  during  the  algorithm,  while  eliminating  a  variable  a?,  an  inequality  of  the  form  0  <  c^,  where 
Ci  is  positive  is  generated,  then  in  contrast  to  the  rational  case,  there  is  no  guarantee  any  more  that 
inequalities  leading  to  this  inequality  can  be  satisfied.  Unlike  in  the  rational  case,  it  may  not  be 
possible  to  extend  an  integer  solution  of  the  derived  inequalities  (which  does  not  have  any  assignment 
for  a?)  to  an  integer  solution  of  the  original  inequalities,  as  the  following  simple  example  illustrates. 
Let  Ci  =  4a?  <  7,  Cj  =  -6a?  <  -8.  We  deduce  the  inequality:  3*  4a? -  2*  6a?  <3*7-2*  8,  which 
is  equivalent  to  0  <  5.  The  two  inequalities  being  considered  are:  12a?  <  21  and  —12a?  <  —16,  i.e, 
16  <  12a?  <  21,  But  there  is  no  a?  that  lies  in  this  interval.  So  if  after  eliminating  all  the  variables,  no 
contradictory  inequality  is  generated,  there  is  no  guarantee  yet  that  the  inequalities  are  satisfiable. 
It  becomes  necessary  to  check  whether  intervals  for  variables  are  large  enough  to  produce  a  satisfying 
assignment.  For  unsatisfiability,  an  additional  constraint  can  be  added:  if  Ci  is  equal  to  A  <  qx, 
Cj  is  equal  to  CjX  <  5,  add  diB  <  rfjA  +  lcm{ci^  Cj)  —  di  —  dj. 

We  now  give  the  details  with  examples. 


2.2.1  Preprocessing  equalities  and  inequalities 

Equalities  are  processed  first.  For  each  equality,  it  is  checked  whether  the  gcd  of  the  coefficients  of 
the  variables  divides  the  constant  in  the  equality;  if  it  does  not,  then  the  equality  cannot  be  satisfied. 
In  an  equality,  a  variable  with  a  unit  coefficient  is  preferred  for  elimination.  A  system  of  equalities  is 
solved  using  the  algorithm  given  in  [17].  Variables  solved  for  can  be  eliminated  from  the  inequalities. 

A  constraint  cixi  +  -''Cixi  <  d,  where  d  is  an  integer,  can  be  simplified  by  dividing  it  by 
g  -  gcd{ci,- ■■  ,ci)  to  c\xi-^  ■■  c'jXi  <  d',  where  c^- =  ^,  d' =  [y-J . 

Below,  we  assume  that  equalities  have  been  processed  and  inequalities  have  been  simplified.  Any 
new  equality  or  inequality  generated  during  the  extended  Fourier  procedure  is  preprocessed  in  the 
same  way. 
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2.2.2  Processing  inequalities 

As  in  [20],  given  a  set  V  of  integer  inequalities,  classify  all  inequalities  based  on  whether  x  appears 
with  a  negative  coefficient,  positive  coefficient,  or  does  not  appear  at  all. 

U  ^  CiX  ?  =  1, 2, . . . ,  p, 
djx  <  Vj  j  =  1,2, . .  .,g, 
gi<0  /=l,2,...,s. 

L0t  ‘pf  be  the  set  of  inequalities  deduced  from  V  by  eliminating  x  by  pairwise  resolution  of  an 
inequality  with  a  negative  coefficient  of  x  and  an  inequality  with  a  positive  coefficient  of  x.  Assuming 
that  P'  is  satisfiable  and  <t  is  a  satisfying  assignment  for  P',  it  is  possible  to  extend  this  assignment 
to  include  a  value  of  x  to  have  a  satisfying  assignment  for  V  provided  the  value  lies  in  the  intervals 
specified  by  P.  This  check  is  delayed  until  the  end;  however,  it  is  necessary  to  keep  track  of  intervals 
for  possible  values  that  x  can  take. 

Let  L  =  lcm{ci , . . . ,  Cp,  di, . . . ,  dg).  Then  inequalities  in  V'  imply  these  inequalities 


<  TO  l,2,...,p;  j  =  1,2, 
^i<0  '  /=1,2,...,5. 


MV'  has  a  solution  a*,  then  cr  can  be  extended  to  include  a  value  of  x  that  would  be  a  solution  of  P 
provided  an  integral  multiple  of  L  lies  in  the  interval 


max 


—  1,2, j  1,2, ...,5. 


Let  a  finite  set  of  inequalities,  {f'M  <  L  x  <  ^rj,  i  -  1,2,  j  -  1,2,  be  called  the 

defining  constraint  on  x,  denoted  as  Defx .  If  for  every  solution  aofV',  no  integral  multiple  of  L  lies 
in  the  interval  defined  by  instantiating  the  defining  constraint  of  by  o',  then  V  is  unsat isfi able  even 
MV'  is  satisfiable.  For  checking  satisfiability,  it  then  becomes  necessary  to  generate  an  assignment 
at  the  end  of  the  algorithm  and  extend  it  by  making  sure  that  every  variable  satisfies  its  defining 
constraint.  While  generating  a  satisfying  assignment,  we  may  find  that  it  is  not  possible  to  extend 
a  partial  assignment  further,  thus  detecting  unsatisfiability. 

If  a  Fourier  step  involves  deleting  inequality  constraints  because  a  variable  x  being  eliminated 
appears  only  with  negative  coefficients  (or  positive  coefficients),  say,  U  <  CjX,  Ci  >  0,  1  <  i  ^  P 
(^djx  <  Tj,  dj  >0,  I  <  j  <q,  respectively),  then  max{f-Ji  \  I  <  i  <  p}  <  Lx  {Lx  <  min{j-rj  |  1  < 

i  <p},  respectively)  serves  as  the  defining  constraint  for  x. 

Theorem  2.5  Let  V  be  a  finite  set  of  inequalities  as  defined  above.  Let  V'  be  the  inequalities 
generated  from  V  after  eliminating  x .  Let  the  defining  constraint  for  x,  be 

^li  <  L  X  <  2  =  1,2, . .  .,p;  j  =  1,2, . . 

a  dj 

V  is  satisfiable  if  and  only  ifV'  is  satisfiable  using  an  assignment  a  and  there  exists  a  multiple  of  L 
in  the  interval  max{(T{^li)},  rnin{(T{^rj)}  . 

3  For  simplicity,  we  assume  in  the  discussion  below  that  exactly  one  variable  x  is  eliminated  in  a  macro  Fourier  step. 
The  case  of  more  than  one  variables  getting  eliminated  while  eliminating  one  variable  can  be  handled  in  a  similar 

manner.  „  ■ 

'‘Because  of  pairwise  resolution,  inequalities  generated  in  P'  are  likely  to  have  smaller  numbers  appearing  as 
coefficients.  Cooper  [4]  instead  multiplied  every  inequality  with  L/cj,  where  Ci  is  the  absolute  value  of  the  nonzero 
coefficient  of  r  in  an  inequality  Ci,  replaced  Lx  by  a  new  variable  x'  and  required  x'  to  be  divisible  by  L. 
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Example  2.6  Let  'P  =  {4x  4*  4y  <  3, 2x-{’2y>  1}.  After  a  Fourier  step  to  eliminate  x,  one  gets 
2  —  4t/  <  3  —  42/,  where  L  =  /cm(2,4)  =  4,  which  gives  the  inequality  2  <  3,  which  is  satisfiable. 
The  defining  constraint  for  a;  is  2  —  4^/  <  4a:  <  3  —  4?/,  and  there  is  no  integral  multiple  of  4  in  the 
interval  [2  —  4i/,  3  —  Ay].  Hence  V  is  unsatisfiable. 

In  the  above  example,  it  could  be  deduced  quickly  that  no  value  of  x  can  satisfy  the  inequalities  in 
V.  In  general,  many  macro  Fourier  steps  may  have  to  be  done,  and  then  solutions  tried  to  get  to  an 
interval  using  which  a  solution  may  not  be  extensible. 

There  are  many  refinements  and  optimizations  for  checking  satisfiability.  For  example,  it  is  not 
necessary  to  search  through  all  possible  assignments  of  a  variable  in  the  whole  interval  of  values 
given  by  its  defining  constraint.  We  will  not  discuss  them  here  for  lack  of  space.  Neither  will  we 
discuss  the  more  general  case  of  a  linear  combination  of  variables  possibly  getting  eliminated  in  an 
elimination  step  and  its  implications  for  checking  satisfiability.  An  interested  reader  may  consult 
[13].  We  would  however  like  to  mention  that  delaying  the  interval  check  to  the  end  is  useful  for 
unsatisfiable  formulas.  Although  we  have  not  performed  an  experimental  comparison,  it  is  likely 
to  be  much  more  elfiicient  than  Cooper’s  method  in  which  the  defining  constraint  on  a  variable  is 
expressed  using  mod. 

Recently,  Jaffar  et  al  [9]  discussed  a  transitive  closure  algorithm  for  processing  integer  inequalities 
that  have  at  most  two  variables;  this  algorithm  is  based  on  Shostak’s  method  for  computing  loop 
residues.  They  showed  that  if  the  coefficients  of  variables  in  these  inequalities  are  units  (TVPI),  then 
satisfiability  can  be  checked  in  0{n^)  time  and  O(n^)  space,  where  n  is  the  number  of  inequalities. 
Fourier’s  algorithm  can  be  used  for  unit  TVPI  problems,  and  it  has  the  same  complexity. 


2.3  Using  implicit  equalities  in  detecting  unsatisfiability 

In  contrast  to  Cooper’s  approach  as  well  as  Shostak’s  approach,  a  rewrite  rule  based  prover  is  always 
on  the  lookout  for  deducing  equalities  so  that  they  can  be  used  for  simplification,  rewriting,  and 
eliminating  variables.  RRL  uses  equalities  (implicit  as  well  as  explicit)  as  rewrite  rules.  As  soon 
as  such  an  equality  is  identified,  it  can  be  used  as  a  rewrite  rule  to  simplify  other  inequalities. 
This  becomes  especially  important  while  using  the  extended  Fourier’s  algorithm  for  deciding  linear 
arithmetic  with  uninterpreted  function  symbols  as  well  as  in  the  presence  of  function  symbols  defined 
using  rewrite  rules.  Below  we  discuss  how  implicit  equalities  are  derived  using  the  extended  Fourier’s 
algorithm  for  integer  inequalities. 

The  method  of  Lassez  and  Maher  discussed  earlier  for  the  case  of  linear  inequalities  over  the 
rationals  extends  to  integers  as  the  following  theorem  states. 

Theorem  2.7  An  inequality  Ck  in  a  set  of  consistent  inequalities  V  is  an  implicit  equality  in  integers 
if  the  extended  Fourier's  algorithm  produces  an  inequality  0  <  0  using  Ck- 


Example  2.8  Consider  the  set  of  integer  linear  inequalities 

Cl  :  a-6<0,  C2  :  «^-/(a)<0,  C3  :  /(a)  <  1, 

C4:  C5:  -6-/(6)<“2,  Ce:  f{a)^f{f{h))<-l. 

It  is  easy  to  verify  that 

C\  -{-  2(^2  4*  2C3  +  C4,  =  0^0 

Thus  we  have 

Cf:  a  =  6,  C2=:  h=f{a), 

C3=:  /(a)  =  l,  C4=:  a  +  6  =  2. 

As  will  be  clear  from  subsequent  discussion,  the  above  equalities  are  used  to  deduce  that  each  of 
a,  6,  /(a),  /(6),  /(/(&))  take  the  value  1.  From  this,  the  unsatisfiability  of  the  above  set  of  inequalities 
follows  since  Cq  cannot  be  satisfied. 
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The  reader  should  note  two  crucial  properties  which  are  different  from  the  case  of  inequalities 
over  the  rationals.  Firstly,  there  is  no  guarantee  that  the  deduced  implicit  equality  is  satisfiable, 
as  the  inequalities  from  which  the  equality  is  deduced  may  not  be  satisfiable.  The  second  is  that 
the  converse  of  the  above  theorem  does  not  hold.  That  is,  there  are  equalities  (implicit  as  well  as 
derived)  of 'P  which  cannot  be  deduced  directly  by  the  extended  Fourier’s  algorithm.  It  is  necessary 
to  do  some  extra  work,  as  illustrated  by  the  following  examples. 

Example  2.9  Consider  the  following  set  of  integer  linear  inequalities 

Cl  :  x-^y<-\,  €2-  -a;  +  4j/<2, 

Cs  :  X  <2,  C4  :  -X  <  -1. 

Eliminating  y  from  Ci,  C2,  we  obtain  0  <  1;  the  defining  constraint  for  y  is  that  x  +  l<4y<x+2. 
Eliminating  x  from  C3,  C4,  we  obtain  0  <  1;  the  defining  constraint  for  x  is  that  1  <  a:  <  2.  So  the 
extended  Fourier’s  algorithm  does  not  deduce  any  equality.  However,  the  above  set  of  inequalities  is 
satisfiable,  and  the  only  solution  is  x  =  2,y=l  since  for  a;  =  1,  there  is  no  y  satisfying  the  defining 
constraint  of  y.  This  implies  that  4y  =  a:  +  2  as  well  as  a:  =  2  are  implicit  equalities.  This  example 
illustrates  that  even  though  the  constraint  on  x  suggests  two  assignments,  one  of  those  assignments 
cannot  be  extended  as  there  are  no  values  of  y  satisfying  the  defining  constraint  of  y. 

Example  2.10  Consider  the  following  set  of  integer  linear  inequalities 

Cl  :  x-4y<-l,  C2  :  -x  +  4y<2, 

C3  :  a:  <  3,  C4  :  -a:  <  -2. 

The  defining  constraint  for  y  is  that  a;  +  l  <  4j/  <  ai+2,  and  the  defining  constraint  for  x  is  2  <  x  <  3. 
For  both  values  of  x,  y  =  1  which  is  an  equality  that  follows  but  cannot  be  obtained  directly  from 
any  inequality  by  replacing  ^  by  — .  This  example  illustrates  that  a  finite  set  of  linear  inequalities 
may  not  have  any  inequality  which  is  really  an  equality  but  it  may  imply  other  equalities. 

Extending  Fourier’s  algorithm  to  deduce  all  implicit  equalities  from  linear  inequalities  over  the 
integers  is  an  interesting  challenge. 

2.4  A  decision  procedure  for  pure  linear  arithmetic 

A  quantifier-free  formula  without  any  (uninterpreted  or  interpreted)  function  symbols  can  be  decided 
using  the  extended  Fourier’s  algorithm.  Given  a  formula  F,  its  proof  can  be  done  by  refutation.  An 
easy  (but  inefficient)  way  is  to  transform  the  negation  of  F  into  a  disjunctive  normal  form. 


—<F  =  Cl  V  G2  V  •  •  •  V  Gn 

where  each  Gi  is  a  conjunction  of  integer  inequalities.®  Henceforth,  a  disjunctive  normal  form  will 
be  used  for  the  sake  of  simplicity.  The  formula  -iF  is  satisfiable  iff  one  of  Gi  is  satisfiable,  and  so  F 
is  valid  iff  none  of  Gi  is  satisfiable. 

Every  literal  is  transformed  into  a  disjunction  of  conjunction  of  inequalities  using  only  <  pred¬ 
icate.  For  example,  A  ^  B  is  replaced  by  (A  >  B  V  A  <  B);  in  case  of  naturals  and  integers, 
A  <  B  is  replaced  by  A  -fl  <  B,  whereas  for  rationals,  a  new  variable,  called  a  surplus  variable,  is 
introduced,  so  we  have  A  •+•  z  <  B  with  the  requirement  that  z  is  strictly  positive.  Firstly,  equalities 
are  solved  and  used  to  eliminate  variables.  Inequalities  are  simplified.  Then  Fourier’s  extended 
procedure  is  applied  on  inequalities.  Any  equalities  and  inequalities  deduced  are  first  preprocessed 
with  redundant  inequalities  removed. 

®As  pointed  by  Cooper  [4],  it  is  not  necessary  to  transform  a  formula  into  a  disjunctive  normal  form;  instead 
once  negation  is  pushed  down  to  the  literals,  the  resulting  formula  can  be  checked  for  satisfiability  as  it  is.  Another 
approach  toward  avoiding  generating  a  disjunctive  normal  form  is  to  use  if-ihen-else  notation  used  in  the  AFFIRM 
system  as  well  as  Boyer  and  Moore’s  theorem  prover. 
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It  should  be  noted  that  one  need  not  have  three  separate  implementations  of  Fourier’s  algorithms 
-  one  for  rationals,  another  for  integers  and  third  for  naturals.  Instead,  it  suffices  to  have  an 
implementation  of  the  extended  Fourier’s  algorithm.  In  the  case  of  integers  or  naturals,  satisfiability 
is  declared  only  if  no  contradictory  inequality  is  generated  and  a  satisfying  assignment  for  variables 
can  be  generated  from  their  defining  constraints.  In  the  case  of  rationals,  satisfiability  is  declared  if 
no  contradictory  inequality  is  generated.  Further  surplus  variables  are  eliminated  at  the  end,  and 
before  elimination,  it  is  checked  that  each  surplus  variable  can  indeed  be  assigned  a  positive  value 
(which  is  ensured  by  checking  that  the  interval  for  the  surplus  variable  includes  a  positive  value); 
one  way  to  check  this  is  to  analyze  the  minimum  of  upper  bounds  for  positiveness. 

For  natural  numbers,  an  additional  constraint  of  nonnegativeness  for  every  variable  over  the 
naturals  is  added.  But  more  importantly,  the  semantics  of  subtraction,  to  be  denoted  by  ©  (also 
known  as  Peano  subtraction),  is  different.  Given  xQy,  there  are  two  cases  to  be  considered:  (i)  x 
is  no  smaller  than  y,  and  (ii)  x  is  smaller  than  y,  in  which  case  the  result  is  0. 

3  Linear  Arithmetic  with  Uninterpreted  Function  Symbols 

Given  a  Presburger  formula  G  with  occurrences  of  uninterpreted  function  symbols,  a  formula  Ga  is 
generated,  which  is  a  conjunction  of  GE  A  G',  such  that  G'  is  a  pure  Presburger  formula  without 
any  uninterpreted  symbols  and  GE  is  a  conjunction  of  equalities  relating  functional  terms  to  new 
symbols,  called  abstraction  constants.  GE  does  not  include  any  occurrence  of  any  arithmetic  operator 
other  than  =.  Ga  is  unsatisfiable  if  and  only  if  G  is  unsatisfiable. 

Equalities  in  G'  are  used  to  eliminate  variables  from  inequalities  in  G'  as  well  as  from  GE.  The 
resulting  equalities  in  GE  can  be  processed  using  a  congruence  closure  algorithm  [23]  or  a  Knuth- 
Bendix  completion  procedure  on  ground  terms  which  is  guaranteed  to  terminate  [18,  8,  12].  In  RRL 
,  the  Knuth-Bendix  completion  procedure  on  ground  terms  is  used  which  gives  a  complete  decision 
procedure  for  ground  equalities  as  a  finite  set  of  rewrite  rules.  The  rewrite  rules  are  used  to  normalize 
G'. 

The  formula  G^  is  subsequently  processed  using  the  extended  Fourier’s  algorithm.  Any  equality 
deduced  from  G'  is  added  to  GE  and  a  new  canonical  ground  rewrite  system  is  incrementally  gener¬ 
ated  by  adding  the  new  equalities.  This  may  result  in  additional  equalities  relating  functional  terms 
and  hence  abstraction  constraints,  which  may  further  simplify  the  inequalities  currently  being  con¬ 
sidered  in  the  extended  Fourier  algorithm  (see  example  2.11  for  instance  in  which  implicit  equalities 
are  used  to  relate  terms). 

This  repeated  interaction  between  the  decision  procedure  for  equality  on  ground  terms,  equalities 
deduced  from  Fourier’s  algorithm  and  normalization  of  inequalities  using  equalities  terminates.  This 
is  so  because  (i)  the  ground  completion  procedure  always  terminates  and  (ii)  Fourier’s  algorithm 
terminates  since  in  each  macro  step,  at  least  one  variable  appearing  in  the  inequalities  is  eliminated 
(and  number  of  variables  appearing  in  the  inequalities  never  increases;  the  number  may  decrease 
because  of  equalities). 

The  soundness  of  the  procedure  follows  from  the  soundness  of  the  ground  completion  and  Fourier’s 
algorithm.  So,  if  unsatisfiability  is  detected  by  Fourier’s  algorithm,  then  G  is  unsatisfiability.  Oth¬ 
erwise,  if  the  procedure  terminates  giving  a  satisfying  assignment  for  Ga,  then  G  can  be  shown  to 
be  satisfiable  by  building  a  model  for  it  provided  all  equalities  that  can  be  deduced  from  GE  and 
G*  have  been  made  explicit  using  Fourier’s  algorithm  and  ground  completion.  Assuming  that  there 
is  a  way  to  extend  Fourier’s  algorithm  to  deduce  all  implicit  equalities,  this  approach  gives  a  com¬ 
plete  decision  procedure  for  Presburger  arithmetic  with  uninterpreted  symbols.  In  the  absence  of  a 
procedure  for  deducing  all  implicit  equalities  from  integer  inequalities,  a  method  proposed  in  [23] 
can  be  used.  All  possible  equalities  among  variables  appearing  in  Ga  are  deduced  using  a  complete 
decision  procedure  for  unsatisfiability  of  integer  inequalities. 
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Example  3.1  Consider  an  example  from  [27]: 

z  =  f(x  -  y)  A  X  =  z  y  A  -y  ^  -{x  -  f{f{z))). 

The  functional  terms  f{x  -y),  x-  y,  and  f{f{z))  are  abstracted  to  be  ui,  U2  and  1/3,  respectively. 
We  thus  have: 

{f{^  -y)-uiAx-yz^U2A  f{f{z))  =  U3)  A  (z  =  ui  A  X  =  2;  +  y  A  ^  -{x  -  U3)). 

Ground  completion  on  equalities  would  give:  {z  ^  ui,x  ^  y  ui,U2  ^  ui.u^  ^  ui,f{ui)  — >  t/i}. 
From  this  and  ui  1/3,  a  contradiction  follows. 


4  Integration  of  Fourier’s  Algorithm  in  a  Rewrite  System 

Most  work  on  Presburger  decision  procedure  assumed  formulas  in  pure  Presburger  arithmetic  [1,  6] 
or  formulas  with  uninterpreted  function  symbols  [4,  2,  25,  26].  In  [27,  23],  methods  for  combining 
decision  procedures  are  described  which  enable  handling  some  interpreted  function  symbols,  for 
example  the  quantifier-free  theory  of  lists  with  cons,  car,  cdr.  In  [3],  Boyer  and  Moore  described 
how  Hodes’  procedure  can  be  integrated  with  interpreted  symbols  defined  as  lisp  functions.  In  this 
section,  we  discuss  the  integration  of  a  complete  decision  procedure  for  Presburger  arithmetic  with 
interpreted  function  symbols  defined  as  a  finite  set  of  canonical  unconditional  rewrite  systems. 

Definition  4.1  A  rewrite  rule  is  an  oriented  equation  of  the  form  Ihs  —>■  rhs,  where  Ihs  is  its 
left-hand  side  and  rhs  is  its  right-hand  side.  A  rewrite  system  7^  is  a  finite  set  of  rewrite  rules. 

Given  a  rewrite  system  H  and  a  term  t[t],  where  t  is  a  subterm  of  1,11  reduces  t  to  another  terrn 
s  if  there  is  a  rewrite  rule  /  — r  and  a  substitution  7  such  that  j{l)  =t]sist  with  the  subterm  t 
replaced  by  j{r). 

Definition  4.2  A  rewrite  system  U  is  canonical  if  and  only  if  it  is  terminating  and  confluent,  i.e. 
it  does  not  admit  any  infinite  rewrite  sequence  and  every  rewrite  sequence  from  a  term  t  can  be 
extended  to  give  a  unique  normal  form,  called  the  canonical  form  of  t. 

It  is  easy  to  see  that  1Z  is  a,  decision  procedure  for  equations.  Given  an  equation  s  =  f,  we 
compute  the  canonical  forms  of  s  and  t,  and  check  for  equality.  If  they  are  equal,  then  H  s  =  t; 
otherwise,  77  [=  (s  =  f)  does  not  follow.  However,  it  is  not  possible  to  get,  in  general,  a  decision 
procedure  for  the  quantifier-free  theory  of  77  even  though  77  is  canonical.  For  example,  consider  77 
to  be  the  associativity  rule  (oriented  in  any  way),  which  constitutes  a  canonical  rewrite  system  and 
serves  as  a  decision  procedure  for  the  equational  theory  of  free  semi-groups.  If  it  was  possible  to  get 
a  decision  procedure  for  the  quantifier-free  theory  of  77,  then  we  would  be  able  to  solve  the  word 
problem  of  any  finitely  presented  semi-group  (including  the  ones  with  unsolvable  word  problems) 
by  specifying  it  as  a  conditional  equation,  in  which  the  conditions  are  the  finite  presentation  of  the 
semi-group,  and  the  conclusion  is  an  equation  relating  two  words. 

The  Knuth-Bendix  completion  procedure  and  its  extensions  can  be  used  as  a  semi-decision  pro¬ 
cedure  for  the  quantifier-free  theory  of  77.  Whether 

77  1=  f’  =  ((si  =  A  •  •  •  A  Sfc  =  ffc)  D  (s  =  f)) 

can  be  checked  as  follows;  let  s' ,t' ,s'i,t'i  be  Skolem  forms  of  s,t,Si,ti,  respectively,  obtained  by 
introducing  Skolem  constants  for  the  variables  in  F.  The  completion  procedure  can  be  attempted  on 
77U{s','  =t'i}.  If  the  conclusion  s'  =  t'  in  the  conditional  equation  indeed  follows,  while  attempting 
to  generate  an  augmented  canonical  system  from  77,  the  conclusion  can  be  proved. 

If  77  is  such  that  for  any  finite  set  GE  of  ground  equations  expressed  using  symbols  in  77  and 
constants,  completion  terminates  producing  a  finite  canonical  rewrite  system  77',  then  77'  can  be  used 
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to  decide  whether  the  conclusion  of  a  conditional  equation  whose  conditions  constitute  GE^  follows 
from  11  or  not.  We  will  call  a  canonical  H  with  this  property  as  an  admissible  rewrite  system  H. 
For  an  admissible  H,  its  quantifier-free  theory  is  decidable  using  completion.  (The  above  statement 
is  true  even  when  the  conjecture  above  is  generalized  to  (Li  A  *  ♦  •  A  ijb)  D  T,  where  L  as  well  as  each 
1  ^  ^  is  an  equation  or  a  disequation.) 

Example  4.3  Consider  an  interpreted  symbol  defined  using  the  equation  f{f{x))  —  x.  The  rewrite 
system  {f{f{x))  x}  can  be  shown  to  be  admissible.  The  conjecture  is  f(x)  =  f{y)  D  a:  =  y.  The 
Skolem  form  of  its  negation  is  f{a)  =  f{b)  Aa  ^  b.  The  pound  equality  f{a)  =  f{b)  is  oriented  to 
/(a)  /(6).  Its  left  side  /(a)  superposes  with  f{f{x))  giving  a  ground  superposition  /(/(a))  from 

which  tt  =  6  follows,  and  this  gives  a  contradiction,  implying  that  the  conjecture  follows. 

This  example  indicates  that  it  is  not  sufficient  to  normalize  literals  in  a  conjecture  using  1Z\  instead 
it  is  necessary  to  superpose  ground  equalities  of  the  negation  of  the  conjecture  with  rules  in  It. 

Example  4.4  As  another  example,  consider  a  theory  of  lists  with  cons^  car,  cdr  as  discussed  in 
[23,  27].  The  defining  rules  for  the  interpreted  function  symbols  are: 


1. 

cons{car{x),  cdr{x)) 

— >  x 

2. 

car{cons{x,  y)) 

X 

3. 

cdr{cons{x,  y)) 

These  three  rules  can  be  shown  to  constitute  an  admissible  rewrite  system.  The  formula  F  = 
cons{car{x),  cdr(car{y)))  =  cdr{cons{y,  x))  D  cdr{car{y))  =  cdr{x)  can  be  shown  to  follow  from  the 
rules  as  follows. 

4.  cons{car{a),cdr{car{h)))  cdr{cons{h,a)). 

Rule  4  superposes  with  rule  3  to  produce: 

5.  cdr{car{hy)  cdr(a), 

from  which  the  contradiction  follows. 

This  complete  decision  procedure  for  the  quantifier-free  theory  of  an  admissible  rewrite  system  It 
can  be  combined  with  the  complete  decision  procedure  for  Presburger  arithmetic  with  uninterpreted 
symbols  to  give  a  complete  decision  procedure  in  the  case  of  both  uninterpreted  and  interpreted 
symbols.  Similar  to  the  case  of  Presburger  arithmetic  with  uninterpreted  symbols,  functional  terms 
are  abstracted  in  a  conjecture  using  new  abstraction  constants.  These  ground  equalities  and  other 
ground  equalities  in  a  conjecture  are  then  completed  using  ground  completion;  additional  equalities 
are  generated  by  superposition  of  ground  equalities  with  It.  New  equalities  are  used  to  normalize 
linear  inequalities,  from  which  additional  implicit  equalities  are  deduced.  This  interplay  similar  in 
the  CcLse  of  uninterpreted  symbols,  gives  a  complete  decision  procedure  when  interpreted  symbols 
are  axiomatized  by  an  admissible  rewrite  system. 

There  exist  equational  theories  which  have  a  decidable  quantifier-free  theory  but  they  do  not 
admit  admissible  rewrite  systems.  Furthermore,  there  are  equational  theories  with  a  decidable 
quantifier-free  theory  which  have  a  canonical  rewrite  system,  but  the  rewrite  system  is  not  admissible. 


5  Fourier’s  Algorithm  and  Conditional  Rewriting 

Functions  on  commonly  used  data  structures  such  as  lists,  sequences,  arrays,  records,  etc.,  are 
typically  expressed  using  conditional  rewrite  rules;  unconditional  rewrite  rules  are  not  sufficient. 
Further  lemmas  and  theorems  about  functions  are  also  typically  conditional  rewrite  rules.  Tecton 
and  its  theorem  prover  RRL  support  definitions  and  lemmas  given  as  conditional  rewrite  rules. 
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Definition  5.1  A  conditional  rewrite  rule  is  a  rule  of  the  form 

Ihs  —>■  rhs  if  pi  A  P2  •  •  •  A  p*, , 

where  Ihs  is  the  left-hand  side  of  the  rewrite  rule,  rhs  is  the  right-hand  side  and  pi,P2,---,Pk  are 
the  conditions. 

A  term  t[t'],  where  t  is  a  subterm  of  t,  rewrites  to  another  term  s  using  a  conditional  rewrite 
rule  I  if  pi  A . . .  A  p„ ,  if  there  is  a  substitution  j  such  that  =  t  and  each  of  the  conditions 
7(pi)  reduces  to  true  recursively  also  by  rewriting  using  rules;  s  is  t  with  the  subterm  t  replaced  by 
7(r).  If  TZ  does  not  include  such  a  rule,  then  t  is  said  to  be  in  a  normal  form.  As  should  be  evident, 
conditional  rewriting  is  a  recursive  process  since  in  order  to  apply  a  rewrite  rule,  its  conditions  must 
reduce  to  true,  which  itself  is  determined  by  rewriting.  Termination  of  this  process  is  guaranteed 
by  using  a  termination  ordering  >t  in  which  /  ><  r  as  well  as  I  >t  Pi-  For  a  detailed  treatment  of 
conditional  rewriting  used  in  RRL  ,  see  [29,  30]. 

For  combining  a  complete  decision  procedure  for  Presburger  arithmetic  with  a  decision  procedure 
for  interpreted  symbols  given  as  a  canonical  (conditional)  rewrite  system,  it  is  possible  to  identify 
conditions  on  a  conditional  rewrite  system  similar  to  the  one  discussed  for  unconditional  rewrite  rules 
in  the  previous  section.  The  effectiveness  of  such  an  approach  is  unclear  because  firstly,  it  may  not  be 
possible  to  generate  a  canonical  conditional  rewrite  system,  and  then  secondly,  generating  additional 
ground  equalities  from  conditional  rewrite  rules  and  equalities  can  be  expensive.  In  Tecton,  we  have 
attempted  to  make  a  compromise.  We  do  not  assume  that  interpreted  symbols  are  defined  using 
a  canonical  conditional  rewrite  system.  We  also  do  not  perform  any  superpositions  between  the 
conditional  rewrite  system  and  the  ground  equalities  obtained  from  the  conditions  of  the  conclusion. 
So  the  implementation  is  incomplete,  but  our  experience  in  using  it  suggests  that  it  works  in  most 
situations.  Before  giving  all  the  steps  performed  in  our  preliminary  implementation,  we  will  use  a 
streamlined  example  to  illustrate  the  main  features  of  the  implementation. 

Example  5.2  A  goal  to  prove  is; 

{p{x)  A  (2  <  f{max{x,  y)))  A  (0  <  min{x,  y))  A{x<  max{x,  y))  A  {max{x,  y)  <  a;))  D  (z  <  g(x)  -f  y) 

Among  the  rewrite  rules  in  the  rewrite  system  are  the  following  two  rewrite  rules  for  the  interpreted 
symbols  max,  f,  g,p. 

Ri  :  min{x,y)^y  if  max(x,y)  =  x 
R2  ■  /(»)  <  9{x)  ->■  true  if  p(x) 

The  goal  is  negated  and  Skolemized  to  give: 

p(A)  A(L  <  f(max(A,  B)))  A  (0  <  min{A,  B)) 

A(A  <  max(A,  B))  A  (max(A,  B)  <  A))  A  -<(L  <  g(A)  -f  B). 

(To  save  space,  we  do  not  use  abstraction  constants  to  abstract  functional  terms.)  The  literal  p(A) 
is  the  only  one  not  belonging  to  Presburger  language.  The  set  of  linear  inequalities  are: 

Cl  :  -f(max(A,  B))  +  L  <  0,  Cs  :  -min(A,  B)  <  -1,  C3  :  -max(A,  B)  +  A<0, 

C4  :  max(A,  B)  —  A  <  0,  C5  :  g(A)  +  B  —  L  <  0 

Because  C3  -b  C4  =  0  <  0,  the  implicit  equality 

El  :  max(A,  B)  —  A 

is  derived  by  Fourier’s  algorithm.  This  equality  is  used  to  simplify  the  inequality  set  to  produce  the 
following  inequality  set 

C[:  -f{A)-\-L<0,  (^2  :  -min{A^  B)  < C5  :  g{A) -\- B  -  L  <  0. 
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Now  L  can  be  eliminated  to  give: 

C2  :  —min{A,  B)  <  —1,  C'^  :  -/(A)  +  g{A)  +  5  <  0. 

Since  min{A,B)  matches  the  left  side  of  rule  Ri  using  the  substitution  {x  ^  A,  y  ^  J5},  its 
condition  max(Ay  J5)  =  ^4  must  be  established.  This  equality  is  already  known,  so  min[A^  B)  is 
reduced  to  B;  the  equality  min(Aj  B)  =  B  is  also  added  to  the  equality  set.  Assuming  an  ordering 
f  >  Qy  term  f{A)  also  matches  a  maximal  term  in  the  linear  rule  i?2j  so  the  condition  p{A)  must  be 
established,  which  is  already  in  the  equality  set.  So,  the  instance  of  the  linear  rule,  f{A)  <  g{A),  is 
added  to  the  inequality  set,  giving: 

C^:  -^<“-1, 

C;:  -f{A)^giA)-^B<0y 

Ce:  fiA)^giA)<0. 

Fourier ^s  algorithm  detects  a  contradiction  because  an  inequality  0  <  —1  is  generated,  implying  that 
the  original  goal  is  proved. 

5 . 1  Implementation 

In  this  subsection,  we  give  a  brief  sketch  of  our  preliminary  implementation.  Many  design  decisions 
were  greatly  influenced  by  the  discussion  in  Boyer  and  Moore's  report  [3]  in  which  they  extensively 
discussed  data  structures  for  representing  a  data  base  of  contexts  and  the  interaction  between  the 
rewriter  and  the  decision  procedure. 

Our  preliminary  implementation  works  as  follows.  Given  a  goal  G,  it  is  negated  and  Skolemized. 
The  negated  Skolemized  goal  may  be  divided  into  many  subgoals  by  splitting  at  the  top  most  level 
if  there  is  a  disjunction.  Each  of  the  subgoals  which  is  of  the  form  Li  A  ♦  •  •  A  Ljfc  where  each  Li  is  a 
literal,  is  attempted  for  unsatisfiability.  From  L^'s,  a  set  GE  of  ground  equalities  (including  literals 
such  as  p(u)  or  p{ci)  =  false)  is  collected.  Below,  we  give  the  steps: 

1.  Ground  completion  is  performed  to  generate  a  canonical  rewrite  system  for  ground  equalities. 
If  a  contradiction  is  detected,  the  subgoal  is  unsatisfiable.  Otherwise,  the  canonical  rewrite 
system  is  used  to  normalize  inequalities. 

2.  The  normalized  inequalities  are  passed  to  Fourier's  algorithm  for  elimination,  (a)  Whenever 
an  implicit  equality  is  generated,  it  is  passed  to  step  1,  which  is  repeated,  (b)  If  a  contradiction 
is  detected,  then  the  subgoal  is  unsatisfiable.  After  steps  1  and  2,  if  no  progress  can  be  made, 
go  to  the  next  step. 

3.  A  maximal  literal  (term)  among  all  the  ground  inequalities  and  ground  equalities  generated 
is  selected.  This  literal  is  checked  for  matchability  with  the  left  side  of  a  rule  defining  an 
interpreted  symbol.  In  the  case  of  a  linear  rule  (such  as  R2  above  relating  fygyp)j  a  maximal 
term  in  the  left  side  of  a  rule  is  used  for  checking  a  possible  match. 

Suppose  ^  is  a  substitution  under  which  a  match  is  possible  with  a  rule;  it  is  then  checked 
whether  the  condition  in  the  rewrite  rule,  if  any,  instantiated  by  6  follows  from  the  remaining 
literals  (ground  equalities  and  inequalities  generated  so  far)  and  other  rewrite  rules  defining 
interpreted  symbols  appearing  in  the  condition.  This  is  done  using  contextual  rewriting  [29,  30]. 
If  the  condition  in  the  rewrite  rule  is  established,  then  the  sub  term  in  the  goal  matching  the  left 
side  is  replaced  by  its  right  hand  side  instantiated  with  9.  The  above  steps  are  then  repeated 
on  the  result. 

This  process  of  checking  for  matchability  is  itself  recursive,  as  it  may  involve  doing  steps  1,  2, 
and  3  on  conditions  (on  a  smaller  input  in  a  well-founded  order). 

If  at  any  step,  it  is  not  possible  to  reduce  a  maximal  literal,  then  the  procedure  terminates 
declaring  that  it  is  unable  to  prove  the  subgoal  and  hence,  the  original  goal. 
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Further  Extensions 


We  have  used  the  Tecton  proof  system  to  verify  properties  of  many  problems,  including  sorting 
algorithms  such  as  insertion  sort  and  quicksort,  string  matching  algorithms,  the  termination  of 
Takeuchi’s  function  [22],  many  efficient  iterative  programs  for  computing  arithmetic  functions  us¬ 
ing  the  so-called  Russian  peasant  algorithm.  Recently,  we  have  also  verified  properties  of  parallel 
programs  expressed  using  a  powerlist  data  structure  and  recursion  as  proposed  by  Misra  [21].  The 
use  of  the  procedure  for  Presburger  arithmetic  has  made  the  proofs  compact  and  relatively  easier  to 
automate  and  understand  in  contrast  to  proofs  generated  without  using  Presburger  arithmetic. 

In  our  initial  implementation,  Fourier’s  algorithm  detected  inconsistency  if  an  inequality  0  <  c 
is  derived  where  c  <  0.  No  attempt  was  made  to  check  for  unsatisfiability  of  integral  equalities, 
or  to  check  whether  a  satisfying  assignment  could  be  generated  from  defining  constraints  on  the 
variables  eliminated.  In  the  case  of  natural  numbers,  we  limited  the  amount  of  case  analysis  by 
considering  only  some  of  the  conditions  generated  from  Peano’s  subtraction  operation.  This  was 
again  based  on  efficiency  considerations.  We  are  currently  extending  our  implementation  to  include 
additional  checks  discussed  in  this  paper.  Further,  some  aspects  of  computing  superposition  of 
ground  equalities  with  conditional  rewrite  rules  defining  interpreted  function  symbols  that  can  be 
performed  efficiently,  would  also  be  included,  and  this  is  likely  to  extend  the  class  of  formulas  that 
the  Tecton  system  would  be  able  to  handle  automatically. 

Acknowledgement:  We  thank  Mahadevan  Subramaniam  for  helpful  comments  on  earlier  drafts 
of  the  paper. 
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Abstract 

Three  most  important  resultant  formulations  are  the  Ma¬ 
caulay,  Dixon  and  sparse  resultant  formulations.  For  most 
polynomial  systems,  however,  the  matrices  constructed  in 
these  formulations  become  singular  and  the  projection  op¬ 
erator  vanishes  identically.  In  such  cases,  perturbation  tech¬ 
niques  for  Macaulay  formulation  such  as  generalized  char- 
acteristic  polynomial  (GCP)  and  a  method  based  on  rank 
suhmatrix  computation  (RSC),  applicable  to  aU  three  formu¬ 
lations,  can  be  used,  giving  four  methods,  Macaulay/ GCP, 
Macaulay/i?5G,  Dixon/P5G  and  Sparse/P5G,  for  comput¬ 
ing  nontrivial  projection  operators. 

In  this  paper,  these  four  methods  are  compared.  It  is 
shown  that  the  Dixon  matrix  is  (by  a  factor  up  to  0(e”) 
for  a  certain  class)  smaller  than  the  sparse  resultant  matrix 
which  is  (by  a  factor  up  to  0(e”)  for  a  certain  class)  smaller 
than  the  Macaulay  matrix.  Empirical  results  confirm  that 
Dixon/P5G  is  the  most  efficient,  followed  by  Sparse/P5G 
then  Macaulay/P5G  and  finally  Macaulay/ GGP,  which  is 
found  to  be  almost  impractical.  All  four  methods  are  found 
to  generate  extraneous  factors  in  the  projection  operator. 

Efficient  heuristics  for  interpolation,  used  to  expand  the 
resultant  matrices,  are  also  discussed. 

1  Introduction 

Solving  a  set  of  nonlinear  polynomial  equations  or  deriving 
conditions  for  the  existence  of  their  solutions  is  a  funda¬ 
mental  problem  in  many  areas  of  mathematics,  engineering, 
physical  and  computer  sciences.  These  methods  sometimes 
involve  elimination  of  variables  from  a  given  set  of  polyno¬ 
mial  equations  to  obtain  a  set  of  polynomials  whose  vanish¬ 
ing  is  a  necessary  (and  sometimes  sufficient)  condition  for 
the  existence  of  solutions.  Elimination  of  n  variables  from 
n  +  1  equations  results  in  a  single  such  polynomial  known  as 
the  resultant.  In  recent  years,  as  computational  power  has 
increased,  a  lot  of  research  has  been  devoted  to  eliminating 
variables  symbolically  and  computing  the  resultant. 

*  Supported  in  part  by  a  grant  from  United  States  Air  Force  Office 
of  Scientific  Research  AFOSR-9 1-0361. 


The  resultant  of  a  system  of  polynomials  is  an  element 
of  a  Grobner  basis  of  its  ideal  [3]  if  an  elimination  monomial 
ordering  is  used,  but  this  is  an  expensive  way  to  compute 
the  resultant.  Until  recently,  a  classical  method  by  Sylvester 
was  used  to  compute  the  resultant  [9].  This  method  elimi¬ 
nates  one  variable  from  two  polynomials,  so  to  eliminate  n 
variables  from  n  +  1  polynomials,  it  has  to  be  applied  suc¬ 
cessively.  It  turns  out  that  more  efficient  methods,  which 
eliminate  all  variables  together  from  the  set  of  polynomi¬ 
als,  had  been  developed  at  the  beginning  of  this  century  by 
mathematicians  such  as  Cayley  [7],  Dixon  [10]  and  Macaulay 
[20].  These  multivariate  resultant  formulations  were  recently 
resurrected  by  many  researchers,  and  have  numerous  appli¬ 
cations  [21,  8,  17]. 

Typically,  these  multivariate  resultant  formulations  try 
to  express  resultants  as  formulas  involving  determinants. 
Ideally,  one  wants  a  single  matrix  whose  determinant  is  the 
resultant,  however  this  is  not  always  possible^ .  In  general, 
one  has  to  settle  for  formulations  which  either  express  the 
resultant  as  a  function  of  more  than  one  determinants  or 
give  a  single  matrix  whose  determinant  is  a  nontrivial  mul¬ 
tiple  of  the  resultant.  In  this  paper,  we  concentrate  on  such 
approaches. 

Three  major  multivariate  resultant  formulations  are  the 
Macaulay  [20,  4],  Dixon  [10,  18]  and  sparse  [22,  5]  resultant 
formulations.  Given  a  set  of  polynomials,  these  formula¬ 
tions  construct  matrices,  called  the  Macaulay  matrix,  the 
Dixon  matrix  and  the  sparse  resultant  matrix,  respectively. 
In  Macaulay  formulation,  the  ratio  of  the  determinants  of 
the  Macaulay  matrix  and  one  of  its  submatrices  gives  the  re¬ 
sultant  or  some  multiple  of  the  resultant^  (called  a  projection 
operator).  In  Dixon  and  sparse  resultant  formulations,  the 
determinant  of  the  respective  matrices  gives  a  projection  op¬ 
erator.  Unfortunately,  sometimes  while  working  with  non¬ 
generic,  nonhomogeneous  polynomial  systems,  some  or  all 
these  three  matrices  can  become  singular  and  the  projection 
operator  vanishes  identically.  Such  trivial  projection  oper¬ 
ators  are  useless  and  give  no  information  about  the  affine 
solutions  of  the  system. 

One  way  to  extract  a  nontrivial  projection  operator  is 
by  perturbation  of  the  polynomial  system  [15,  4].  Ad  hoc 
perturbations  can  be  used  with  all  three  formulations,  but 
they  may  not  work.  For  Macaulay  formulation,  Canny  [4] 

^  Weyman  and  Zelevinsky  [24]  classify  certain  cases  in  which  a  pure 
determinantal  formulation  is  possible.  We  have  been  informed  by  a 
referee  that  Jouanolou’s  work  also  discusses  cases  which  permit  pure 
determinantal  formulation. 

^In  the  case  of  generic  homogeneous  polynomials,  Macaulay  for¬ 
mulation  results  in  the  exact  resultant. 


gave  a  general  method  to  perturb  any  polynomial  system, 
compute  its  Generalized  Characteristic  Polynomial  (GCP) 
and  extract  a  nontrivial  projection  operator  from  the  GCP. 
We  call  this  the  Macaulay/  GCP  method. 

Another  method  for  extraction  of  a  nontrivial  projection 
in  the  face  of  singular  resultant  matrices  was  outlined  in  our 
previous  paper  [18].  This  method  uses  rank  subdeterminant 
computation  (RS&j  which,  though  developed  in  conjunction 
with  Dixon  formulation,  is  applicable  to  all  three  formula¬ 
tions.  This  gives  three  more  methods,  Macaulay/ /?5(7, 
Dixon/ RSC  and  Sparse/ RSCy  giving  a  total  of  four  meth¬ 
ods  for  computing  nontrivial  projection  operators.  However, 
the  choice  between  these  four  methods  is  not  clear,  as  they 
have  never  been  compared. 

The  purpose  of  this  paper  is  to  compare  these  four  meth¬ 
ods.  We  give  theoretical  bounds  on  the  relative  sizes  of  the 
three  resultant  matrices  in  the  worst  case  for  two  different 
measures,  viz.,  toted  degree  of  the  polynomials  and  the  de¬ 
gree  of  polynomials  in  individual  variables.  It  is  found  that 
the  Dixon  matrix  is  the  smallest,  followed  by  the  sparse 
resultant  matrix,  and  the  Macaulay  matrix  is  the  largest. 
In  fact,  in  the  worst  case  under  the  individual  variable  de¬ 
gree  measure,  the  Dixon  matrix  is  smaller  by  an  exponential 
(0(e”))  factor  than  the  sparse  resultant  matrix,  which  in 
turn  is  smaller  by  an  exponential  (again,  0(e”))  factor  than 
the  Macaulay  matrix.  Moreover,  we  show  by  example  that 
all  four  methods  suffer  with  extraneous  factors. 

The  comparison  is  evidenced  by  empirical  results  on  11 
problems  (implicitization,  geometry  formula  derivation,  eq¬ 
uilibrium  of  Lorentz  system.  Vision  etc.)  and  3  random 
examples.  Dixon/ RSC  seems  most  efficient  and  was  able  to 
solve  all  14  problems,  followed  by  Sparse/ RSC  which  also 
solved  all  problems,  but  took  much  (up  to  50  times)  longer. 
Next  was  Macaulay/jR5C'  which  could  not  solve  3  problems, 
and  on  others,  took  much  longer  than  both.  Sparse/ RSC  and 
Dixon/ RSC,  Macaulay/ GCF  was  almost  impractical  due  to 
the  introduction  of  an  extra  perturbation  variable,  and  could 
solve  only  2  problems.  The  software  we  used  is  described  in 
section  5.  For  comparison  purposes,  we  also  give  timings  for 
Grobner  basis  construction. 

In  section  2,  we  outline  the  three  multivariate  resultant 
formulations  and  point  out  their  limitations.  In  section  3, 
we  give  two  ways  to  get  around  these  limitations  and  extract 
a  nontrivial  projection  operator.  In  section  4,  we  give  the¬ 
oretical  bounds  on  the  relative  sizes  of  the  three  resultant 
matrices.  In  section  5,  implementation al  details  of  the  four 
methods  and  some  improvements  on  Zippel’s  interpolation 
algorithm  are  outUned.  In  section  6,  we  compare  the  four 
methods  and  give  empirical  evidence.  Section  7  concludes 
giving  some  future  research  problems. 

2  Multivariate  Resultant  Formulations 
2.1  Notation  and  Preliminaries 


smallest  necessary  condition,  which  we  call  the  resultant. 
Multiples  of  the  resultant  are  called  projection  operators  and 
their  vanishing  is  also  a  necessary  condition  for  !F  to  have  a 
common  solution.  The  resultant  may  be  reducible^ ,  but,  if  a 
projection  operator  is  irreducible,  it  must  be  the  resultant. 
All  factors  in  a  projection  operator  besides  the  resultant 
are  extraneous  factors.  If  a  polynomial  system  has  solutions 
in  C”  under  all  possible  specializations  of  the  parameters 
(i.e.,  the  system  has  an  affine  variety  of  dimension  >  m  —  1 
in  then  the  resultant  is  identically  zero.  Also,  an 

identically  zero  polynomial  trivially  qualifies  as  a  projection 
operator  for  any  system  of  polynomials  since  it  is  a  multi¬ 
ple  of  the  resultant  with  an  extraneous  factor  of  zero.  Such 
trivial  projection  operators  are  obviously  useless  as  they  give 
no  information  about  the  solutions  of  polynomial  systems. 
For  polynomial  systems  whose  affine  variety  is  of  dimension 
m  —  1  in  C*^*^”*,  we  are  interested  in  computing  projection 
operators  which  are  not  identically  zero,  at  least  when  the 
system  has  a  nontrivial  common  solution  (i.e.,  a  solution  in 
the  algebraic  torus  (C  —  {0})”  for  some  specialization  of  the 
parameters),  and  hopefully  have  few  extraneous  factors. 

Let  J  C  Q  [X,  A]  be  the  ideal  generated  by  T.  A  straight¬ 
forward  way  to  compute  the  resultant  is  by  computing  the 
Grobner  basis  of  J  using  an  appropriate  elimination  order¬ 
ing.  The  resultant  is  then  an  element  of  the  basis.  However, 
Grobner  basis  methods  are  inefficient.  Consider  the  follow¬ 
ing  alternative  approach. 

Find  another  set  of  polynomials  T  with  the  following 
two  properties: 

1.  Set  of  solutions  of  T  is  preferably  equal  to  or  a  superset 
(as  small  as  possible)  of  the  set  of  solutions  of  T . 

2.  Number  of  different  terms  (in  X)  contained  in  the 
polynomials  of  T  is  exactly  as  many  as  the  number 
of  polynomials  in  T, 

If  such  an  f  can  be  found,  then  it  can  be  viewed  as  a  lin¬ 
ear  system  of  homogeneous  equations  by  treating  each  term 
in  X  (including  1)  as  an  independent  variable.  A  projec¬ 
tion  operator  can  then  be  found  as  the  determinant  of  the 
coefficient  matrix  of  this  linear  system. 

The  three  most  efficient  methods  for  computing  the  re¬ 
sultant,  viz.  Macaulay,  Dixon  and  sparse  resultant  formula¬ 
tions,  try  to  find  such  Ts.  In  this  section  we  briefly  outline 
these  formulations.  Our  objective  is  not  to  describe  methods 
in  full  detail,  rather  to  give  just  enough  information  so  that 
the  reader  can  get  an  idea  of  the  resultant  matrix  sizes  and 
limitations  of  these  formulations.  Before  we  do  that,  we  give 
an  example  and  its  resultant.  This  example  is  used  through¬ 
out  the  paper  to  illustrate  the  applicability  or  limitations  of 
different  methods. 

Example:  (Resultant)  Let  T  =  {pi,p2yVs}  C  Q[a,  &, c, 
X,  y\  where 


Let  X  =  {xi,...,Xn}  be  a  set  of  ra  variables  and  A  = 
{tti, . . . ,  am},  a  set  of  m  parameters  distinct  from  X.  Let 
T  =  {pi, . . .  ,Pn+i}  C  Q  [XyA]  be  a  set  of  w  -)- 1  nonhomo- 
geneous  polynomials.  Let  ideg{py  X)  be  the  total  degree  of 
the  polynomial  p  in  the  variable  set  X  and  deg{pyXj)y  the 
degree  of  the  polynomial  p  in  the  variable  Xj. 

The  objective  is  to  eliminate  the  variables  X  from  T  and 
obtain  a  polynomial  in  the  parameters  A  whose  vanishing 
is  a  necessary  condition  for  the  existence  of  solutions  to  T , 
Note  that  there  may  not  exist  a  condition  which  is  also  suf¬ 
ficient  in  the  nonhomogeneous  case.  However,  there  is  the 


Pi  =  ax^ -H  +  (&-!- c  —  a)x  +  ap -f  3(c  —  1), 

P2  =  2a^ x^ 2ahxy ahy y 

P3  =  4(a  -  h)x  -h  c(a  +  h)y  H-  4a6. 

The  variables  x  and  y  are  to  be  eliminated  to  obtain  the 
resultant,  a  polynomi^  in  the  parameters  a,  h  and  c.  From 

^For  generic  polynomials  the  resultant  is  always  irreducible,  how¬ 
ever  under  specializations,  the  resultant  can  be  reducible.  Eg.  the 
resultant  of  nr  -f  ^  =  0  and  ftrr  -|-  a  =  0  is  (a  —  b){a  +  b). 


the  Grobner  basis  of  this  system,  the  resultant  is  found  to 
be  the  following  irreducible  polynomial  with  75  terms: 

p(a,  b,  c)  =  144ca^h^  +  28a^c^b®  -  144a^b‘*  +  36c^a^b^  -  72c®o®b^  +  4cb® 

+  -  12c^o®b®  -  240a^b^  -  264a^c^b  +  336o^c^b  -  144a^cb 

-  72a®cb  -  384a'*  cb^  +  36a®c^  +  112a®  b^  ~  8co^b^  -  80a®b'*  -  32a^6^ 
+  20ca‘*b‘*  +  96ba®  +  48a^b®  +  288a^b®  -  96c^b^  +  192a^b®  -  ISa^b''’ 

40a^c^b^  192a®b^  *{•  16a®b^c  *4*  16a^b®c  —  8a^b  c  —  24o  c  b 

-  8a^b®c^  +  2a^c^b’^  -  12a^b^c  +  80a®b®  +  20ca^b®  -  264c^a^b^ 

-  96a®c^b  -  aOa^b^c^  +  320o^cb^  -  40o*cb®  -  16a^c^b^  +  44a^c^b^ 

-  216o^cb^  +  2a^c^b^  +  16o®cb^  +  108a®6^c^  -  20o®b®c  -  28a®c^b‘* 

-  8a^c^b®  -I-  2a®c^b®  +  72a^c^b  +  4a^c^b^  +  36a®c‘*b^  +  2ac^b® 

-  24c^a^b^  +  120c®o^b®  +  64a''’b2  +  72ca2b^  -  52c^ab®  +  96a®cb 

-  4cab'''  +  2c2ab'’'  +  48cab®  ~  16a^b®  +  ac^b®  +  4c®ob®  -  24a®c^b 

-  24a^b^c®  +  216a^cb^  +  40a^c®b^  -  48a^b^  +  Sea^c"*  -  72a®c®.  O 


2.2  The  Macaulay  Formulation 

For  1  <  2  <  n  +  1,  let  di  =  tdeg{pij  X)  and  dM  =  1  + 
—  !)•  Let  the  set  of  terms  T  =  {a:“*  ^  .  x®"  | 

«i  +  Q'2  +  .  .  .  +  an  <  dM}*  Each  polynomial,  p*,  in  ^  is  multi¬ 
plied  with  certain  terms  to  generate  T  with  |T|  equations  in 
|T|  unknowns  (which  are  terms  in  T).  The  coefficient  matrix 
of  T  so  constructed  is  the  Macaulay  matrix.  Columns  of 
the  Macaulay  matrix  are  labeled  by  the  terms  in  T  in  some 
order,  and  each  row  corresponds  to  a  polynomial  in  T  mul¬ 
tiplied  by  a  certain  term. 

The  order  in  which  polynomials  are  considered  for  se¬ 
lecting  multipliers  results  in  different  (but  equivalent) 

Let  Ma  denote  the  Macaulay  matrix  for  a  permutation  <t 
of  polynomials  in  T.  Macaulay  also  defined  iVcr,  a  certain 
square  submatrix  of  Ma,  such  that  the  projection  operator 
is  the  ratio  of  the  determinants  of  Ma  and  N a.  See  [20,  4,  17] 
for  details  of  the  construction  and  proofs  of  these  properties. 

Example:  (Macaulay)  For  the  example  at  the  beginning 
of  this  section,  T  =  {x^,x^y,  x^^xy^^y^^y^^xy,  x,  p,  1).  !F  is 
obtained  by  multiplying  pi  by  x,p  and  1,  P2  by  x,p  and  1 
and  p3  by  xp,  x,p  and  1.  The  Macaulay  matrix,  Ma-  is  the 
following  10  X  10  matrix: 


XX  PI 
y  X  Pi 
PI 

X  X  P2 
y  X  P2 
P2 

xy  X  P3 
X  X  P3 
y  X  P3 
P3 


b  (b  +  c  -  a)  I 


O 

2ab 


04(a  b)  0  c(a 
0  0  4(a  -  b)  ( 


ab 


•  b)  0 
0 

0  c(a  —  b)4(o  —  b) 
0  0 


(6  +  c- 
b 

ab 
0 

2ab 
4ab 
c(o  —  b) 


3(c  -  1)  0  0 

a)  0  3(c  -  1)  0 

(b  +  c  —  a)  a  3(c  —  1)| 


4(a  —  b)c(a  —  b)  4ab 


0  / 

ab  / 


and  Na  is 


Pi 

P2 


X  y 

(2^  o)- 


Unfortunately,  here  both  det(Ma)  and  dei[Na)  are  0,  so 
the  resultant  or  a  nontrivial  projection  operator  cannot  be 
computed  directly.  G 

Limitation:  As  is  demonstrated  in  this  example,  this  for¬ 
mulation  may  be  unsuccessful  in  computing  a  nontrivial  pro¬ 
jection  operator.  One  of  the  reasons  is  that  det{Na)  can 
be  identically  zero,  and  the  division  cannot  be  carried  out. 
Even  if  that  is  not  the  case,  det(Ma)  can  be  identically  zero, 
hence  giving  only  a  trivial  projection  operator. 


2.3  The  Dixon  Formulation 

Let  X  =  {xi,  X2, .  * . ,  Xn}  be  a  new  set  of  variables  and 


SiX,X) 


Pl,l 

P2,\  ■  *  *  -P2,n-f  1 


Pn,\  *  *  '  Pn,n-\-l 

jPl  (^1 }  j  •  •  *  j  ^n)  Pn-\-l{xi  j  X2j  •  •  •  j  Xn^ 


where  for  1  <  j  <  n  and  1  <  i  <  (n  4*  1), 

and  pi(xi, . . . ,  Xfc,  Xfc+i,  . . . ,  Xn)  stands  for  uniformly  replac¬ 
ing  Xj  hy  Xj  for  1  <  j  <  k  in  Pi,  The  polynomial  6  is  known 
as  the  Dixon  polynomial.  Now,  !F  is  the  set  of  all  coef¬ 
ficients  (which  are  polynomials  in  X)  of  terms  in  6,  when 
viewed  as  a  polynomial  in  X,  The  coefficient  matrix  of  P 
is  known  as  the  Dixon  matrix.  It  can  be  proven  that  for 
generic  ndegree^  polynomials,  the  Dixon  matrix  is  square  - 
hence  its  determinant  is  a  projection  operator  [10,  18], 

Example:  (Dixon)  The  Dixon  matrix  for  the  previous 
example  is  of  dimension  2x3: 


1 

^4ca^b^  -  c^b^  +  12cab^ 
-12ca^b  +  12a^b  -  12ab2 
—  c^b^a  +  cb^a^  —  4ab^ 
-cb®  -  4a®b2  +  8a2b® 


4a®b^  -  8a'*b 

—  6a® c  +  6a® 
+6a2c2b  -  eca^b 

—  cb® a^  —  c6^  a 


y 


6a^c^b  -  6cab^ 
+4a2b®  -  8a®b2 
—  cb®  —  cb^a 
-•6ca^b  +  6ac^b^ 


I  800^62  _  4^,5  +  24cab2 
I  -SOca^b  -f.  240^6  _  24ab2 

I  I  — 6a®c  +  6a®c2  —  cb®a2 
I  — 8a^b  —  4a®b2  —  cb^a 
\+8a2b3  +  6a^c^b  +  4ab^ 


12a®b  -  4o2b2 
~8a^  -I-  2a®c^ 
+2ca^b^  -  2o‘*c 
+2a2c2b 


2ac2b2  4.  12a2b2 
+2a2c2b  -  2a®cb 
-8a®b  -  4b®a 
+2cb®a 


Since  the  Dixon  matrix  is  rectangular,  its  determinant  can¬ 
not  be  computed.  Hence  the  resultant  cannot  be  computed 
by  this  formulation  either.  □ 

Limitation:  Again  this  example  demonstrated  the  limita¬ 
tion  of  Dixon  method.  The  Dixon  matrix  can  be  rectangular, 
hence  eliminating  the  possibility  of  computing  the  determi¬ 
nant.  Even  if  it  is  square,  it  may  be  singular,  resulting  in  a 
trivial  projection  operator. 


2.4  The  Sparse  Resultant  Formulation 


This  approach  is  based  on  the  recent  results  pertaining  to 
sparse  polynomial  systems  (Bernshtein  [2],  Gelfand  et.  al. 
[16]  and  Sturmfels  [22]).  Sparse  resultants  appeared  in  Stu- 
rmfels  k,  Zelevinsky  [23],  its  matrix  formulation  was  given 
by  Canny  k  Emiris  [5]  and  was  successively  improved  in 
[6,  11,  12]  using  better  heuristics.  The  Bezout  bound  on  the 
number  of  solutions  to  a  set  of  polynomials  is  quite  loose. 
Instead,  this  formulation  uses  the  recently  developed,  and 
significantly  tighter,  BKK  bound  [16]  to  construct  a  smaller 
resultant  matrix  than  Macaulay. 

Newton  Polytope  of  a  polynomial  p  is  the  convex  huU  of 
the  set  of  exponents  (treated  as  points  in  H”)  of  all  terms 
in  p.  Minkowski  sum  of  two  Newton  polytopes  Q  and  S  is 
the  set  of  all  vector  sums,  q  Sy  q  E  Q,  s  E  S,  Let  Qi  C 
be  the  Newton  polytope  (wrt.  X)  of  the  polynomial  pi  in 


+  1  nonhomogeneous  polynomials  pi, . . .  ,Pn+l  in  xi^ ...  ^Xn 
are  called  generic  ndegree  if  all  coefficients  are  independent  parame¬ 
ters  and  there  exist  nonnegative  integers  mi, . .  .  ,mn  such  that  each 
^3  -  ■-x\;^  for  \  <j  <n-\r\. 


T.  Let  Q  =  Qi  H - V  Qn+i  C  IR"  be  the  Minkowski  sum 

of  Newton  poly  topes  of  all  the  polynomials  in  T ,  Let  S  be 
the  set  of  exponents  (lattice  points  of  2Z”)  in  Q  obtained 
after  applying  a  small  perturbation  to  Q  to  move  as  many 
boundary  lattice  points  as  possible  outside  Q, 

Construction  of  T  here  is  similar  to  Macaulay  formula¬ 
tion  -  each  polynomial,  pi  in  T  is  multiplied  with  certain 
terms  to  generate  T  with  \S\  equations  in  |^|  unknowns 
(which  are  terms  in  ^),  and  its  coefficient  matrix  is  the 
sparse  resultant  matrix,  A  projection  operator  for  JT 
is  simply  the  determinant  of  this  matrix.  Columns  of  the 
sparse  resultant  matrix  are  labeled  by  the  terms  in  £  in 
some  order,  and  each  row  corresponds  to  a  polynomial  in  T 
multiplied  by  a  certain  term. 

Recall  that  the  size  of  the  Macaulay  matrix  is  |T|.  Size  of 
the  sparse  resultant  matrix,  |^|,  is  typically  smaller  than  \T\, 
especially  when  the  BKK  bound  is  tighter  than  the  Bezout 
bound.  Algorithms  in  [11,  12,  6]  construct  matrices  using 
some  greedy  heuristics  which  may  result  in  smaller  matrices, 
but  in  the  worst  case,  the  size  can  still  be  \£\. 

Example:  (Sparse)  Continuing  with  the  same  example, 
construction  of  the  Newton  poly  topes  and  their  Minkowski 
sum  (after  a  perturbation  in  the  positive  direction)  reveals 
that  £  =  {xy,xy^,xy^,x^y,x‘^y^,x^y^,x^y,x^y'^,x^y}.  The 
9x9  sparse  resultant  matrix  is: 


The  determinant  of  the  above  matrix  is  a  projection  oper¬ 
ator  with  205  terms  (in  expanded  form): 


for  the  perturbation  variable  to  get  the  projection  operator 
of  the  original  system.  Such  ad  hoc  perturbations  can  be 
used  in  conjunction  with  all  three  resultant  formulations, 
but  they  may  not  work.  The  following  general  perturbation 
mechanism  applicable  to  Macaulay  formulation  was  given 
by  Canny  in  [4]. 

Let  s  be  a  new  perturbation  variable.  Create  a  new  set  of 
polynomials  ^{s)  by  replacing  each  pj  E  ^  (for  1  <  i  <  «) 
by  Pj  -  and  pn+i  by  pn+i  s.  Compute  the  projection 
operator,  e(s),  of  .F(s)  using  Macaulay  formulation.  q{s)  is 
known  as  the  Generalized  Characteristic  Polynomial  ( GCP) 
and  is  a  polynomial  in  s.  The  trailing  coefficient  (wrt.  s)  of 
GCP  is  a  nontrivial  projection  operator  of  T .  Note  that  the 
size  of  the  Macaulay  matrix  is  same  for  T  and  Ti^s),  For  a 
proof  of  the  fact  that  the  projection  operator  so  derived  is 
not  identically  zero  and  it  does  vanish  on  all  the  affine  zeros 
of  the  system,  see  [4].  We  call  this  method  Macaulay/ GCF. 
We  now  finish  the  previous  example  using  this  method. 

Example:  (Macaulay/ GCP)  For  the  example,  det{Ma) 
and  dei{Na)^  both  are  identically  zero,  so  we  compute  the 
GCP.  Construct  ^(s)  and  letting  Af<r(s)  and  A<t(s)  stand 
for  its  Macaulay  and  the  denominator  matrix,  det[N(r{s))  = 
det{Na-Is)  =  s{s-a).  det{Mais))  =  det(Mo— /a)  is  a  poly¬ 
nomial  with  705  terms.  GCP  =  dei{Mff(s))/det{Na{s))y  is 
a  polynomial  with  462  terms.  Finally,  the  projection  opera¬ 
tor,  which  here  is  the  constant  term  of  GGP,  has  159  terms 
(in  expanded  form): 

Q  =  c{a -hh)  (a^c-\-  abc  —  4ah  4-  4fc^)  p  (a,  6,  c) 

Notice  that  there  are  3  extraneous  factors.  D 

3.2  Rank  Submatrix  (RSO) 

Another  way  to  extract  a  nontrivial  projection  operator  from 
degenerate  resultant  matrices  is  through  their  submatrices. 
Following  method  is  applicable  to  all  three  formulations: 


Q  —  — 2a^  (a  —  6  “  c)  (a^ c  +  abc  —  4ab  +  46^)  p  (a,  6,  c) . 

Notice  that  there  are  3  extraneous  factors.  □ 

Limitation:  Even  though,  in  this  particular  example  we 
are  able  to  obtain  a  nontrivial  projection  operator,  it  may 
not  always  be  the  case.  Although  it  seems  rare,  the  sparse 
resultant  matrix  can  be  singular  (Example  2  in  Table  3), 
in  which  case  the  determinant  will  be  a  trivial  projection 
operator. 

3  Two  Ways  to  Extract  the  Projection  Operator 

There  are  two  ways  to  deal  with  degenerate  resultant  ma¬ 
trices  encountered  in  all  three  formulations.  One  involves 
perturbation  of  the  original  set  of  polynomials  and  the  other 
involves  rank  submatrix  computation. 

3,1  Perturbation  (GGP) 

The  main  problem  in  all  three  formulations  is  that  resul¬ 
tant  matrices  can  be  rectangular,  or  even  if  square,  they 
may  be  singular.  Perturbation  of  polynomials  can  result  in 
square  and  nonsingular  resultant  matrices  [15,  4,  8].  One 
can  try  ad  hoc  perturbations  of  T  in  the  hope  of  obtain¬ 
ing  a  nontrivial  projection  operator  as  follows.  Randomly 
perturb  the  system  using  a  perturbation  variable,  compute 
the  projection  operator  for  this  perturbed  system,  factor  it, 
remove  any  extraneous  factors  and  finally,  substitute  zero 


1.  Set  up  the  Resultant  matrix  (P)  of  T. 

2.  If  any  column  in  R  is  linearly  independent  of  the  other 
columns,  then  return  the  determinant  of  any  rank  sub- 
matrix  of  R. 

3.  Else,  this  heuristic  fails. 

The  resultant  matrix,  P,  can  be  any  of  the  three  resultant 
matrices,  giving  three  methods  for  computing  a  nontriv¬ 
ial  projection  operator,  viz.  Macaulay/P5G,  Dixon/PPG 
and  Sparse/ PPG.  By  rank  submatrix  we  mean  a  maximal 
nonsingular  submatrix,  and  its  determinant  can  be  com¬ 
puted  by  performing  Gaussian  elimination  and  returning  the 
product  of  the  (nonzero)  pivots  [18,  14].  In  an  earlier  paper, 
we  required  a  more  complicated  algorithm  since  we  were  de¬ 
riving  conditions  for  existence  of  solutions  in  C”,  but  for 
solutions  in  the  algebraic  torus  (C  —  {0})”j  this  simple  al¬ 
gorithm  suffices.  See  [18]  for  details.  Though  there  exist 
examples  where  the  condition  in  step  (2)  is  not  true,  they 
seem  rare.  In  all  examples  we  tried,  P  was  degenerate  many 
times,  but  the  condition  of  step  (2)  was  always  satisfied. 
For  the  continuing  example  in  this  paper,  we  had  not  been 
able  to  compute  a  nontrivial  projection  operator  using  the 
original  Macaulay  and  Dixon  formulation.  We  do  that  now 
using  Macaulay/PPG  and  Dixon/PPG  respectively. 

Example:  ( Macaulay/ PPC)  It  is  easily  confirmed  that 
the  last  column  of  the  singular  Macaulay  matrix,  M<r,  is  lin¬ 
early  independent  of  the  other  columns,  hence  the  rank  of 
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Table  1:  Matrix  Sizes,  Worst  Total  Degree  Case.  Table  2:  Matrix  Sizes,  Worst  Ind.  Var.  Deg.  Case. 


the  matrix  (9)  decreases  (becomes  8)  on  its  removal.  Projec¬ 
tion  operator  is  the  product  of  the  9  pivots  after  Gaussian 
elimination  and  has  159  terms  (in  expanded  form): 

Q  =  ac{a  (a^c  -f  ahc  -  4a6  +  46^)  p  (a,  fe,  c) . 

Notice  that  there  are  4  extraneous  factors.  □ 

Example:  (Dixon/ /25C)  It  is  easily  confirmed  that  the 
first  column  of  the  rectangular  Dixon  matrix  is  linearly  inde¬ 
pendent  of  the  other  columns,  hence  the  rank  of  the  matrix 
(2)  decreases  (becomes  1)  on  its  removal.  We  do  Gaussian 
elimination  on  the  Dixon  matrix,  and  the  Projection  oper¬ 
ator  is  the  product  of  the  2  pivots  and  has  75  terms  (in 
expanded  form): 

Q  =  -6p(a,6,c). 

Again  there  is  1  extraneous  factor  but  among  all  four  meth¬ 
ods  for  this  example,  the  reader  might  observe  that  this 
method  generates  the  least  number  of  extraneous  factors.  □ 

4  Theoretical  Comparison  of  Matrix  Sizes 

The  size  of  the  resultant  matrix  is  critical  in  determining  the 
efficiency  of  a  given  method.  We  compare  the  sizes  of  the 
three  resultant  matrices  in  the  worst  case  using  two  mea¬ 
sures:  (i)  the  total  degree  of  the  input  polynomials  and  (ii) 
the  degree  of  input  polynomials  in  each  variable. 

Total  Degree  based:  For  1  <  i  <  (n  +  l),  let  tdeg{piyX)  = 
d.  It  is  easily  established  that  the  sizes  of  Dixon,  Macaulay 
and  sparse  resultant  matrix  are  bounded  by 
and  respectively.  These  formulas  show  that  the 

Dixon  matrix  is  smaller®  than  the  other  two.  See  Table  1 
for  typical  worst  case  sizes  in  practice. 

The  total  degree  measure  is  not  very  fair  to  both  Dixon 
and  sparse  resultants.  Better  bounds  are  obtained  for  these 
two  if  the  measure  for  the  degree  of  each  polynomial  in  in¬ 
dividual  variables  is  considered. 

Variable  Degree  based;  For  1  <  i  <  (^^  + 1)  and  1  <  i  < 
n,  let  d  =  deg{pi,xj).  Sizes  of  the  Dixon,  Macaulay  and 
sparse  resultant  matrices  are  bounded  by 
and  ((7i  +  l)d)”  respectively.  Using  these  formulas  and  Stir- 

^It  should  be  pointed  out  that  "sparse”  is  really  a  misnomer. 
Though  the  algorithm  for  sparse  resultant  is  not  completely  insen¬ 
sitive  to  zero  terms,  the  vsrorst  case  is  easily  achieved  for  relatively 
sparse  polynomials.  Eg.,  for  n  =  2,d  =  3,  a  set  of  three  polynomi¬ 
als  with  the  supports  (x^  ,  xy^  ,1)  are  relatively  sparse  (only  5 

terms  as  opposed  to  possible  10),  but  the  sparse  matrix  still  has  the 
worst  case  size. 

®The  formula  for  Dixon  matrix  size  above  is  loose  and  the  exact 
size  is  always  smaller.  Also,  more  efficient  algorithms  for  sparse  re¬ 
sultants  use  heuristics  which  may  construct  matrices  of  smaller  size 
than  the  formula  above  indicates.  However,  such  heuristics  are  not 
considered  here  because  the  order  of  the  bound  does  not  change. 


lings  approximation  for  factorial  (for  asymptotically  large  n 
and  d),  it  can  be  proven  that  Dixon  matrix  is  smaller,  by  an 
exponential  (O  factor,  than  sparse  resultant  matrix, 

which  is  smaller,  also  by  an  exponential  (O  (e”))  factor,  than 
Macaulay,  and  transitively  the  Dixon  matrix  is  smaller,  by 
an  exponential  (O  factor,  than  the  Macaulay  ma¬ 

trix.  Typical  worst  case  sizes  are  shown  in  Table  2.  A  (*) 
entry  means  the  program  ran  out  of  memory. 

The  two  worst  case  scenarios  sketched  above  provide  con¬ 
vincing  arguments  that  the  Dixon  matrix  is  smallest  of  the 
three,  followed  by  sparse  and  Macaulay  is  the  largest.  That 
these  bounds  are  closely  followed  even  in  the  non- worst  case, 
real-world  examples  is  evident  from  the  columns  labeled 
Matrix  Size  in  Table  3.  We  will  use  this  knowledge  to  carry 
out  a  comparison  of  various  methods  in  a  later  section.  For 
a  comprehensive  analysis  of  matrix  sizes  and  comparison  for 
multi-homogeneous  systems,  the  reader  may  wish  to  consult 
[19]. 

5  Implementatlonal  details 

Since  the  resultant  matrices  are  symbolic  (each  entry  be¬ 
ing  a  polynomial  in  the  parameters),  their  determinants  (or 
rank  subdeterminants)  are  computed  using  polynomial  in¬ 
terpolation  rather  than  direct  Gaussian  elimination. 

Implementation  of  Macaulay/ G CP;  Ma{s)  and  iV<T(s) 
(see  section  3.1)  for  T{s)  were  constructed  using  our  imple¬ 
mentation  in  MAPLE.  We  used  our  implementation 
of  ZippePs  sparse  polynomial  interpolation  algorithm  [25] 
to  compute  their  determinants.  Division  of  the  two  deter¬ 
minants  and  extraction  of  the  projection  operator  from  the 
GCP  was  done  on  MAPLE. 

Implementation  of  RSC  methods;  We  used  our  own 
MAPLE  programs  to  construct  the  Macaulay  and  Dixon  ma¬ 
trices,  and  Emiris’  C  program  based  on  the  algorithm  in  [11] 
to  construct  the  sparse  resultant  matrix.  The  linear  inde¬ 
pendence  of  any  column  in  these  matrices  was  checked  prob¬ 
abilistically  by  substituting  random  numbers  for  parameters 
and  performing  the  check  on  numeric  (rather  than  symbolic) 
matrices.  Finally,  the  rank  subdeterminant  was  interpolated 
using  our  implementation  of  Zippel’s  sparse  polynomial 
interpolation  algorithm. 

Implementation  of  Zippel’s  interpolation  algorithm; 
Given  a  resultant  matrix,  a  hlackhoxis  constructed  which  re¬ 
turns  the  value  of  the  polynomial  (determinant  or  the  rank 
subdeterminant)  for  a  particular  value  assignment  to  the  pa¬ 
rameters.  This  is  done  by  (i)  evaluating  each  entry  of  the 
symbolic  resultant  matrix  after  substituting  parameter  val¬ 
ues  then  (ii)  performing  Gaussian  elimination  on  the  matrix 
(which  now  has  integer  entries)  and  finally  (iii)  returning 
the  product  of  diagonal  entries  (for  GCP)  or  nonzero  pivots 
(for  RSq. 
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Table  3:  Empirical  Data 


The  interpolation  algorithm  then  loops  through  m  stages 
(where  m  is  the  number  of  parameters),  each  time  lifting 
one  parameter  by  making  a  number  of  calls  to  the  black¬ 
box,  At  the  end  of  the  stage,  the  polynomial  has  been 
completely  interpolated.  See  [25]  for  details. 

We  performed  aU  computations  in  a  finite  held  modulo  a 
large  prime.  If  the  exact  projection  operator  is  required,  the 
computations  can  be  performed  in  various  finite  fields  mod¬ 
ulo  different  primes  until  the  product  of  the  primes  is  larger 
than  a  predetermined  coefficient  bound.  Finally  exact  coef¬ 
ficients  can  be  lifted  from  their  images  in  various  finite  fields 
using  Chinese  remainder  theorem.  If  the  coefficient  bound 
is  unknown  (or  too  loose),  lift  the  coefficients  successively 
after  each  finite  field  computation  until  they  stabilize. 

In  practice,  the  most  expensive  operation  during  interpo¬ 
lation  is  the  blackbox  computation,  and  the  possibilities  for 
improvement  lie  in  reducing  the  number  of  calls  to  blackbox. 
Two  main  heuristics  we  used  are  as  follows: 

1.  Estimating  individual  variable  degree  bounds: 
Zippel’s  algorithm  assumes  the  same  degree  bound 
(say  D)  for  each  parameter  at  any  stage.  This  as¬ 
sumption  is  costly  for  later  stages  if  the  actual  degree 
of  the  parameter  being  lifted  is  less  than  D,  Fortu¬ 
nately,  when  the  first  parameter  is  being  lifted,  this 
assumption  is  not  too  costly.  Based  on  this  observa¬ 
tion,  using  a  crude  bound  D,  we  interpolate  univariate 
polynomials  in  all  the  parameters  before  starting  mul¬ 
tivariate  interpolation.  From  these  univariate  poly¬ 
nomials  in  each  parameter,  we  determine  individual 
degree  bound  for  each  parameter.  Later,  when  inter¬ 
polating  the  multivariate  polynomial,  we  use  these  in¬ 
dividual  degree  bounds  at  each  stage.  This  results  in  a 
speedup  of  the  algorithm  since  it  reduces  the  number 
of  calls  to  blackbox. 

2.  Order  of  interpolation:  Once  the  individual  degree 
bound  for  each  parameter  is  known,  it  turns  out  that 
the  number  of  calls  to  blackbox  is  sensitive  to  the  or¬ 
der  in  which  the  parameters  are  lifted.  If  parameters 
are  lifted  in  increasing  order  of  their  individual  degree 
bounds,  fewer  calls  are  made  to  the  blackbox  which 
resulted  in  a  speedup. 
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Table  4:  Timings  -  direct  determinant  expansion  in  MAPLE. 

Using  these  two  heuristics,  we  are  able  to  obtain  a  sub¬ 
stantial  speedup  over  a  straightforward  implementation  of 
ZippeFs  algorithm, 

6  Empirical  Results  and  Comparison 

All  empirical  results  are  listed  together  in  Table  3.  The 
14  examples  used  in  Table  3  can  be  obtained  in  MAPLE 
format  by  anonymous  ftp  from  ftp://ftp.cs.albany.edu 
/pub/saxena/exaiaples.l4.  All  timings  are  in  seconds  on  a 
64Mb  SUN  SparclO.  A  (*)  in  a  Time  column  either  means 
that  the  resultant  cannot  be  computed  even  after  running 
for  more  than  a  day  or  the  program  ran  out  of  memory. 
An  N /R  in  the  GCP  column  means  there  exists  a  polyno- 
mizd  ordering  for  which  the  Macaulay  and  the  denominator 
matrix  are  nonsingular  and  therefore  GCP  computation  is 
not  required.  Besides  resultant  formulation  timings,  we  also 
present  timings  for  computing  resultant  using  Grobner  ba¬ 
sis  construction  with  block  ordering  (variables  in  one  block 
and  parameters  in  another)  using  the  Macaulay  [1]  system. 
Grobner  basis  computations  are  also  done  in  a  finite  field,  in 
fact,  with  a  much  smaller  characteristic  (31991)  than  in  the 
resultant  computations.  We  also  tried  Grobner  basis  using 
GB  system  of  Faugere  [13],  but  it  does  not  support  block  or¬ 
dering,  and  lexicographic  ordering  resulted  in  much  inferior 
performance. 

Examples  3,  4  and  5  consist  of  generic  polynomials  with 
numerous  parameters  and  dense  resultants.  Interpolation 
methods  are  not  appropriate  for  such  examples  and  timings 
using  straightforward  determinant  expansion  in  MAPLE  are 
in  Table  4. 

^See  Table  4  for  timing  using  direct  determinant  expansion. 


In  all  four  methods,  the  time  taken  to  compute  the  resul¬ 
tant  depends  on  the  cost  of  the  two  steps  -  (i)  constructing 
the  resultant  matrix  and  (ii)  interpolating  the  resultant  from 
the  resultant  matrix. 

Cost  of  Matrix  Construction  is  insignificant  (see  the 
columns  labeled  Cnstr.  Time  in  Table  3)  for  all  formula¬ 
tions  on  all  examples.  It  should  be  noted  that  the  con¬ 
struction  of  Dixon  matrices  was  done  on  MAPLE  using  a 
straightforward  implementation  without  any  improvements. 
An  efficient  implementation  in  C  would  further  reduce  the 
already  insignificant  Dixon  matrix  construction  time.  So, 
the  total  time  to  compute  the  resultant  is  dominated  by  the 
interpolation  step. 

Interpolation  Cost  (see  section  5)  depends  on  the  cost  of 
an  individual  blackbox  call  and  the  total  number  of  calls  to 
blackbox.  The  cost  of  an  individual  call  to  blackbox  includes 
the  cost  of  substituting  values  for  parameters  in  the  ma¬ 
trix  and  the  cost  of  performing  Gaussian  elimination  on  the 
resulting  matrix.  Even  though  the  cost  of  evaluating  an  in¬ 
dividual  entry  in  Dixon  matrix  is  higher  (because  the  entries 
are  polynomials  of  degree  at  most  n  -f  1  in  the  coefficients 
of  the  original  polynomials),  the  number  of  such  evaluations 
is  determined  by  the  matrix  size  and  is  much  smaller  than 
for  Macaulay  and  sparse  matrices.  Consequently,  the  time 
taken  for  substituting  values  for  parameters  in  the  Dixon 
matrix  is  smaller  than  the  time  taken  for  substitution  in  the 
Macaulay  and  sparse  matrices. 

The  time  taken  by  a  call  to  blackbox  is  however  totally 
dominated  by  Gaussian  elimination  after  substitutions  have 
been  made  into  a  matrix.  Below  we  compare  the  four  meth¬ 
ods  based  on  time  taken  to  perform  Gaussian  elimination  in 
a  blackbox  caU  and  number  of  calls  to  the  blackbox. 

6.1  Dixon/ RSC vs,  Macaulay/ vs.  Sparse/i?5C 

1,  Cost  of  Gaussian  Elimination:  Once  the  substitu¬ 
tion  phase  is  over,  the  matrix  entries  are  only  entries 
from  some  finite  field,  and  the  cost  of  Gaussian  elimi¬ 
nation  depends  solely  on  the  size  of  the  matrices.  Due 
to  the  much  smaller  size  of  the  Dixon  matrix,  time 
taken  for  Gaussian  elimination  is  much  less  than  for 
the  other  two.  Since  Gaussian  elimination  dominates 
the  blackbox  cbJIj  the  cost  of  an  individual  blackbox  call 
is  the  least  for  Dixon  matrix,  followed  by  the  sparse 
and  finally  the  Macaulay  matrix. 

2.  Number  of  calls  to  blackhooc:  The  number  of  calls 
to  blackbox  depends  on  the  structure  of  the  polynomial 
being  interpolated.  If  there  are  no  extraneous  factors, 
the  polynomial  being  interpolated  in  all  three  cases  is 
the  same,  and  so  the  number  of  calls  to  blackbox  is  also 
the  same  in  all  three  methods.  If  extraneous  factors 
are  present,  then  the  cost  depends  on  the  number  of 
terms  in  the  projection  operator.  Usually  the  number 
of  terms  are  less  if  there  are  less  extraneous  factors 
(note  that  this  is  not  true  in  general). 

Since  the  interpolation  cost  depends  mostly  on  the  two  fac¬ 
tors  discussed  above,  total  time  taken  to  interpolate  the  re¬ 
sultant  is  smallest  for  Dixon/ RSC  followed  by  Sparse/RSC 
and  largest  for  Macaulay/RSC. 

This  gradation  of  efficiency  is  reflected  in  Table  3.  All 
the  examples  were  completed  using  the  Dixon/  RSC  method. 
Sparse/i?5C'  also  solved  all  the  examples,  but  always  took 
much  longer  than  Dixon/ RSC.  Macaulay //?5C'  could  not 
complete  2  problems  and  took  much  longer  than  Spaxsef  RSC 
(for  most  problems)  and  Dixon/ RSC  (for  all  problems). 


6.2  RSC  vs.  GCP 

1.  Cost  of  Gaussian  Elimination:  Size  of  the  resul¬ 
tant  matrix  of  a  polynomial  system  after  perturbation 
is  the  same  as  that  before  perturbation  (see  section 
3).  So,  the  cost  of  Gaussian  elimination  is  the  same  in 
GCP  and  RSC  methods. 

2.  Number  of  calls  to  blackbox:  Due  to  the  introduc¬ 
tion  of  a  new  perturbation  variable,  one  more  iteration 
(or  stage)  is  need  during  interpolation  for  that  vari¬ 
able.  This  results  in  more  calls  to  all  the  procedures 
including  blackbox.  Secondly,  due  to  the  new  variable, 
the  number  of  terms  in  the  polynomials  at  each  stage 
of  interpolation  also  becomes  larger.  This  raises  the 
number  of  calls  to  blackbox  even  further®. 

In  practice,  this  rise  in  the  number  of  calls  to  blackbox  al¬ 
most  makes  the  GCP  method  impractical.  On  the  other 
hand,  cost  of  the  RSC  method  remains  the  same.  These 
observations  about  the  GCP  method  are  confirmed  by  com¬ 
paring  the  entries  of  column  labeled  Macaulay/ GCP  against 
the  columns  Dixon/ RSC,  Macaulay/ and  Sphise/ RSC. 

While  working  with  Macaulay  formulation,  it  is  often 
(9  out  of  the  14  examples  in  Table  3)  the  case  that  one 
of  the  two  methods,  RSC  or  GCP,  need  to  be  resorted  to, 
and  analysis  suggests  that  RSC  should  be  chosen  due  to  its 
efficiency. 

The  GCP  method  does  have  a  distinct  merit  in  that  it  can 
always  extract  the  proper  affine  components  of  the  variety, 
whereas  other  methods  may  faO.  However,  the  failure  of 
P5C  methods  seems  to  occur  very  rarely  and  they  have  been 
successfully  applied  to  all  nontrivial  examples  in  the  paper. 

7  Conclusion 

While  working  with  nongeneric,  nonhomogeneous  polynomi¬ 
als,  all  four  elimination  methods  generate  extraneous  factors 
in  the  projection  operators  (see  the  example  used  in  this  pa¬ 
per).  So  the  only  factor  in  making  a  choice  between  the  four 
methods  is  efficiency.  To  this  end,  we  find  that  the  Dixon 
formulation  coupled  with  rank  submatrix  computation  is  the 
fastest  way  to  compute  a  nontrivial  projection  operator  of  a 
set  of  polynomials.  The  main  reason  is  that  Dixon  matrix 
can  be  smaller,  by  a  factor  that  is  up  to  exponential  in  the 
number  of  variables,  than  the  other  two  resultant  matrices, 
viz.,  Macaulay  and  sparse. 

The  comparison  of  matrix  sizes  was  done  in  the  worst 
case  scenario  under  two  different  measures.  Being  worst 
cases,  they  don’t  take  into  account  the  sparsity  of  poly¬ 
nomials  encountered  in  real-world.  Even  though  the  real- 
world  examples  we  presented  show  that  the  Dixon  matrix 
is  indeed  much  smaller  than  the  other  two,  a  theoretical 
analysis  which  takes  into  account  sparsity  needs  to  be  done. 
Our  preliminary  work  in  this  direction  [19]  shows  that,  like 
the  sparse  resultant  formulation,  Dixon  formulation  also  ex¬ 
ploits  the  Newton  polytopes  and  takes  into  account  the  spar¬ 
sity  of  the  input  polynomials,  though  in  a  different  fashion. 

Extraneous  factors  occur  in  all  methods  in  this  paper, 
however,  it  is  not  clear  which  method  generates  least  factors. 
Any  bounds  on  the  number  of  extraneous  factors  generated 
by  the  four  methods  will  be  very  useful.  Another  important 
research  direction  is  to  discover  ways  of  eliminating  extrane¬ 
ous  factors  from  these  methods  without  compromising  their 
efficiency. 

®It  may  be  possible  to  avoid  the  GCP  and  directly  compute  its 
trailing  coefficient,  but  this  is  likely  to  require  as  many  blackbox cb.\\s. 
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Abstract 

An  algorithm  for  converting  a  Grobner  basis,  G\  of  an  arbitrary  ideal  from  one  ordering 
<1  to  a  Grobner  basis  G2  with  respect  to  another  target  ordering  <2  is  presented.  Its 
behavior  does  not  depend  upon  the  dimension  of  the  input  ideal  and  it  works  for  any 
positive  or  zero  dimensional  ideal.  The  target  ordering  can  be  any  ordering  including 
elimination  orderings  such  as  pure  lexicographic. 

The  algorithm  is  similar  to  the  FGLM  algorithm  for  basis  conversion  of  zero  dimensional 
ideals  in  the  following  sense:  it  incrementally  constructs  (i)  a  set  T  of  terms,  (ii)  a  set  G 
of  polynomials  by  computing  the  normal  forms  of  the  terms  in  T  with  respect  to  <1  and 
solving  some  linear  equations  similar  to  those  in  FGLM.  However  the  order  in  which  terms 
are  generated  in  T  is  not  completely  determined  by  the  target  ordering  <2.  If  <2  does  not 
have  the  property  that  for  every  term  t,  there  are  only  finitely  many  terms  smaller  than  t, 
then  an  enumeration  ordering  <«  is  built  using  <1  and  <2  such  that  it  has  that  property. 

For  termination  condition,  it  is  first  checked  whether  all  the  polynomials  in  Gi  reduce  to  0 
modulo  G,  and  if  so,  then  check  if  G  is  a  Grobner  basis.  We  prove  that  there  is  a  T  which 
will  be  generated  within  a  finite  number  of  steps  for  which  the  termination  condition  would 
succeed. 

The  algorithm  has  been  implemented  in  GEOMETER  and  has  been  successfully  tried 
on  a  number  of  examples.  It  seems  to  work  better  than  computing  a  Grobner  basis  directly 
for  ideals  with  small  dimensions.  Planned  extensions  include  development  of  heuristics  for 
developing  good  enumeration  orderings  as  well  as  an  efficient  termination  check.  For  the 
latter,  the  use  of  a  Hilbert  function  of  an  ideal  is  being  investigated. 

1  Introduction 

A  Grobner  basis  algorithm  is  considered  as  one  of  the  most  efficient  elimination  algorithms  in 
practice  for  solving  many  problems  related  to  polynomial  ideals  as  well  as  the  algebraic  vari¬ 
eties  defined  by  polynomial  ideals.  There  has  been  a  great  deal  of  interest  to  obtain  efficient 
algorithms  for  computing  Grobner  bases.  This  is  especially  true  for  computations  under  an 
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elimination  ordering  of  the  variables  as  a  Grobner  basis  with  respect  to  an  elimination  order¬ 
ing  is  the  most  useful  computational  object  for  many  problems.  It  is  generally  agreed  that 
Grobner  basis  computations  are  the  hardest  for  elimination  orderings  of  the  variables  like  pure 
lexicographic  ordering.  For  zero-dimensional  polynomial  ideals,  this  problem  was  tackled  in 
a  novel  manner  in  [1].  Instead  of  computing  a  Grobner  basis  with  respect  to  an  elimination 
ordering  directly,  Faugere  et  al  computed  a  Grobner  basis  w.r.t.  some  other  ordering,  like 
graded  lexicographic  (also  popularly  known  as  the  degree  ordering)  or  graded  reverse  lexico¬ 
graphic  ordering,  which  is  easy  to  compute,  and  the  basis  was  transformed  into  a  basis  w.r.t. 
the  desired  elimination  ordering.  This  transformation  was  performed  using  linear  algebra  tech¬ 
niques.  Impressive  results  were  reported  using  this  technique  of  computing  Grobner  bases  of 
ideals  w.r.t.  elimination  orderings  in  [1].  We  will  refer  to  it  as  the  FGLM  algorithm. 

It  has  not  been  clear  how  the  FGLM  algorithm  can  be  generalized  to  positive  dimensional 
ideals.  If  the  algorithm  is  applied  on  a  Grobner  basis  of  a  positive  dimensional  ideal,  then 
it  does  not  terminate.  The  termination  of  the  algorithm  depends  upon  the  property  that  for 
a  zero-dimensional  ideal,  the  set  of  aU  reduced  terms  with  respect  to  a  Grobner  basis  in  its 
monomial  ideal  is  finite.  For  a  positive  dimensional  ideal,  this  set  of  reduced  terms  in  its 
monomial  ideal  is  infinite,  consequently  the  FGLM  algorithm  goes  on  indefinitely  enumerating 
terms  from  the  reduced  set  forever. 

We  present  an  algorithm  whose  termination  does  not  depend  upon  the  finiteness  property 
of  the  set  of  reduced  terms  with  respect  to  a  Grobner  basis  in  a  monomial  ideal.  Our  algorithm, 
instead,  depends  on  the  property  that  a  Grobner  basis  of  a  polynomial  ideal  with  respect  to 
any  admissible  ordering  is  finite.  Hence  the  algorithm  works  for  ideals  of  any  dimension.  This 
algorithm  can  be  viewed  a  generalization  of  the  algorithm  developed  in  [1]  for  zero  dimensional 
ideals. 

The  paper  is  organized  as  follows.  The  second  section  first  reviews  the  algorithm  for  zero 
dimensional  ideals  as  developed  in  [1].  We  introduce  the  main  new  idea,  using  which  the 
FGLM  algorithm  can  be  generalized  so  that  it  no  longer  depends  on  the  finiteness  of  the  set 
of  reduced  terms  with  respect  to  the  target  Grobner  basis  in  a  monomial  ideal.  The  third 
section  discusses  a  naive,  simple  basis  conversion  algorithm  which  works  for  arbitrary  ideals. 
In  the  fourth  section,  an  efficient  version  of  the  algorithm  is  discussed.  The  fifth  section  gives 
the  correctness  argument  for  the  algorithm.  In  the  sixth  section,  heuristics  are  discussed  to 
improve  the  performance  of  an  implementation  of  the  algorithm.  In  section  seven,  we  present 
two  simple  examples  illustrating  how  the  algorithm  transforms  Grobner  bases  w.r.t.  graded 
lexicographic  orders  (total  degree  orders)  to  the  bases  for  pure  lexicographic  orders.  Finally, 
in  section  eight,  some  preliminary  empirical  results  are  reported. 


2  A  Generalization  of  the  Zero  Dimensional  Algorithm 

The  basis  conversion  algorithm  for  zero  dimensional  ideals  in  [1]  mainly  relied  on  the  fact  that 
there  are  only  a  finite  number  of  reduced  terms  w.r.t.  any  Grobner  basis  of  a  zero  dimensional 
ideal.  This  gives  rise  to  a  staircase  like  volume  which  encloses  all  the  reduced  terms.  We  give 
below  the  main  steps  of  the  algorithm  given  in  [1].  The  algorithm  is  called  FGLM  and  it  takes 
as  input,  a  Grobner  basis  Gi  of  the  ideal  with  respect  to  an  ordering  <i.  Its  output  is  G2,  the 
reduced  Grobner  basis  w.r.t.  a  target  ordering  <2.  In  the  following,  the  subscript  of  <,  means 
that  the  comparision  is  done  using  the  ordering  <i.  For  example,  larger<2  means  that  larger 
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with  respect  to  <2.  Also,  is  a  function  that  computes  the  normal  form  of  a  term  t 

modulo  G\and  <1. 

FGLM(Gi) 

Input  :  A  Grobner  basis  w.r.t.  <1. 

Output  :  The  reduced  Grobner  basis  G2  w.r.t.  <2. 

Algorithm  : 

begin 
B  :=<f> 

InitialTerms  :=  ^ 
to  :=  1 
i  :=  1 

while  there  exists  a  term  larger^^  than  which  is  not 
divisible  by  any  element  of  InitialTerms  do 

ti  :=  the  smallest  <2  term  larger <2  than  tj_i  which  is 

not  divisible  by  any  element  of  InitialTerms 

solve  the  equation  MT<^{ti)  +  ^  —  0- 

if  solution  exists 

then 

let  XjS  be  the  solutions 

{ti  +  ^jtj} 

InitialTerms  :=  InitialTerms  U  {U} 

else 

i  :=  2  +  1 
fi 
od 

returii(5) 

end. 


The  algorithm  is  guaranteed  to  terminate  because  for  zero- dimensional  ideals,  the  number 
of  reduced  terms  for  its  Grobner  basis  w.r.t.  any  admissible  ordering  is  finite. 

This  algorithm  cannot  be  used  for  positive  dimensional  ideals  because  in  general,  it  does  not 
terminate;  there  could  potentially  be  infinitely  many  reduced  terms  with  respect  to  a  Grobner 
basis  of  a  positive  dimensional  ideal.  The  enumeration  process  could  go  on  for  ever  and  we 
may  never  get  a  Grobner  basis.  For  example,  consider  the  case  when  the  reduced  Grobner 
basis  w.r.t.  the  pure  lexicographic  order  (with  x  >  y  >  z)  is  {x  +  yz^  y'^z  —  z}.  If  FGLM  was 
run  on  a  Grobner  basis  of  the  same  ideal  with  a  degree  ordering,  terms  purely  in  z  would  be 
enumerated  and  this  enumeration  would  go  on  forever  since  there  is  no  polynomial  purely  in 
z  in  this  one  dimensional  ideal! 

We  wiU  now  present  a  generalization  of  the  FGLM  algorithm  and  use  this  generalization 
to  get  an  algorithm  for  the  positive  dimensional  case.  The  main  idea  is  to  parameterize  the 
FGLM  algorithm  by  enumerating  terms  only  from  a  certain  finite  set  T.  So  the  inputs  to  this 
algorithm  (which  we  call  Solve)  are  a  Grobner  basis  w.r.t.  <i  and  also  an  additional  input 
which  is  a  set  of  terms  T.  We  will  not  specify  what  the  output  of  this  program  is,  at  this  point, 
but  we  wiU  discuss  some  properties  of  it  after  the  description. 
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Solve((ji,T) 

Input  :  A  Grobner  basis  w.r.t.  <i  and  a  set  of  terms  T. 

Algorithm  : 

begin 

InitialTerms  :=  (j) 

to  :=  smallest  <2  term  in  T 

i  :=  1 

while  there  exists  a  term  larger<2  than  tj_i  in  T  which  is  not 
divisible  by  any  element  of  InitialTerms  do 

ti  the  smallest<2  term  larger<2  than  ti-\  in  T  which  is 

not  divisible  by  any  element  of  InitialTerms 

solve  the  equation  A/’^<i(tj)  +  —  0- 

if  solution  exists 

then 

let  A's  be  the  solutions 

InitialTerms  :=  InitialTerms  U  {tj} 

else 

i  :=  i  +  1 
fi 
od 

return(5) 

end. 


This  algorithm  is  very  similar  to  the  FGLM  algorithm  outlined  above  except  that  it  enu¬ 
merates  only  those  terms  which  are  in  a  predetermined  set  T .  We  refer  to  this  algorithm  as 
Solve.  Some  useful  properties  of  this  algorithm  are: 

•  If  T  is  the  set  of  aU  the  terms  in  the  reduced  Grobner  basis  with  respect  to  the  target  or¬ 
dering,  then  when  algorithm  returns  G2,  the  desired  Grobner  basis  w.r.t.  <2.  Obviously, 
having  T  to  be  a  superset  of  all  the  terms  in  the  reduced  Grobner  basis  with  respect  to 
the  target  ordering  would  also  suffice. 

•  In  case  T  does  not  contain  aU  the  terms  in  the  reduced  Grobner  basis  w.r.t.  <2,  then 
the  algorithm  may  not  give  us  a  Grobner  basis,  but  B  generates  a  subideal  of  the  ideal 
of  the  input  Grobner  basis. 

These  properties  of  this  algorithm  enables  us  to  develop  an  algorithm  for  positive  dimen¬ 
sional  ideals  in  the  following  sections. 


3  The  positive  dimensional  case 

As  discussed  earlier,  the  FGLM  algorithm  would  not  terminate  for  ideals  of  positive  dimensions 
because  its  termination  relies  on  the  fact  that  the  set  of  reduced  terms  with  respect  to  the 
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Grobner  basis  is  finite.  In  fact,  the  algorithm  would  not  give  any  useful  information  for  a 
lexicographic  ordering  if  the  ideal  did  not  include  purely  a  polynomial  in  the  lowest  variable 
in  the  ordering. 

On  the  other  hand,  the  utility  of  the  output  as  well  as  the  efficiency  of  the  generalized 
algorithm  Solve  depends  upon  the  finiteness  of  T  as  well  as  an  appropriate  choice  for  T.  Since 
the  reduced  Grobner  basis  is  finite  for  any  admissible  ordering,  if  T  can  be  built  by  enumerating 
terms  in  such  a  way  that  eventually  all  terms  appearing  in  the  reduced  Grobner  basis  w.r.t. 
the  desired  target  ordering  can  be  included  in  T,  Solve  would  then  return  a  Grobner  basis.  So 
the  parameter  to  Solve,  the  set  of  terms  T  may  be  incrementally  generated  by  an  enumeration 
ordering.  Instead  of  requiring  that  the  terms  be  enumerated  in  the  same  order  as  in  the  target 
ordering,  we  only  require  that  terms  are  enumerated  in  such  an  order  that  we  eventually 
generate  enough  terms  to  include  terms  appearing  in  the  reduced  Grobner  basis  with  respect 
to  the  target  ordering. 

A  good  enumeration  ordering,  <e  guarantees  that  eventually,  T  will  be  a  superset  of  all  the 
terms  in  the  reduced  Grobner  basis  w.r.t.  <2.  But  how  do  we  know  that  we  have  generated 
enough  terms  so  that  the  set  of  terms  that  we  have  is  a  superset  of  the  set  of  terms  in  the 
reduced  Grobner  basis?  In  other  words,  how  do  we  reconize  that  T  is  good  enough,  even  though 
we  are  guaranteed  that  within  a  finite  amount  of  time,  it  would  be?  This  is  the  termination 
condition.  Clearly  it  cannot  be  that  every  term  greater  than  the  terms  generated  so  far  in 
the  enumeration  ordering  is  reducible  (as  in  0-dimensional  FGLM).  For  positive  dimensional 
ideals,  there  are  infinitely  many  reduced  terms.  This  is  an  interesting  research  issue  to  be 
further  investigated.  Currently,  there  are  two  ideas:  (i)  Run  the  algorithm  Solve  each  time  a 
term  from  the  enumertation  ordering  is  produced  and  test  if  the  output  of  Solve  is  a  Grobner 
basis  and  also  if  it  reduces  aU  the  polynomials  of  Gi  to  zero,  (ii)  the  use  of  Hilbert  function 
(polynomial)  for  checking  the  number  of  reduced  terms  of  various  degrees.  Below  we  sketch 
the  algorithm  that  we  have  proposed  here.  A  more  efficient  and  detailed  algorithm  will  be 
given  in  the  next  section.  Assume  that  we  have  a  good  enumeration  ordering,  <e.  Notice  that 
any  sequential  (or  degree  compatible)  ordering  would  suffice. 

Algorithm(Gi) 

1.  T:={1} 

2.  B:=(l> 

3.  while  Grobner  ?{B)  is  false  do 

(a)  t  :=  the  smallest  term  larger<g  than  all  the  terms  in  T 

(b)  T  :=  r  U  {t} 

(c)  B  Solve{Gi,T) 

4.  return(H) 

Here,  the  predicate  Grobner  ?{B)  returns  true  if  H  is  a  Grobner  basis,  else  it  returns  false. 
Also,  reducesB{G\)  returns  true  if  all  the  polynomials  in  G\  reduce  to  zero  modulo  B.  We 
now  discuss  the  two  main  steps  in  the  general  algorithm: 

•  How  to  generate  T  systematically  and  efficiently  based  on  the  input  Grobner  basis  and 
the  target  ordering  so  that  eventually  all  terms  in  the  output  Grobner  basis  would  be 
included? 
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•  How  can  the  two  tests  -  whether  a  given  basis  generates  the  same  ideal  as  the  input 
basis,  and  whether  a  given  basis  is  a  Grobner  basis,  be  performed  efficiently? 

Incrementing  T 

If  the  target  ordering  is  degree  compatible,  then  it  could  be  used  as  an  enumeration  ordering. 
A  degree  compatible  ordering  has  the  nice  property  that  there  are  finitely  many  terms  smaller 
than  any  term  in  the  ordering.  For  arbitrary  ideals,  elimination  orders  such  as  lexicographic 
orderings  are  not  likely  to  work  as  enumeration  orderings  because  in  an  elimination  ordering, 
there  can  be  infinite  terms  below  the  smallest  head-term  of  a  Grobner  basis  with  respect  to  the 
target  order.  For  the  example  above,  there  are  infinitely  many  terms  lower  than  the  smallest 
head  term  y'^z.  In  general,  it  should  be  possible  to  make  use  of  the  structure  of  the  input 
Grobner  basis  to  pick  an  appropriate  ordering  which  is  a  mixture  of  a  degree  compatible  and 
elimination  orderings  such  as  block-orderings.  This  issue  is  under  further  investigation.  If  the 
target  ordering  is  degree  compatible,  then  it  is  used  as  an  enumeration  ordering.  If  the  target 
ordering  is  not  degree  compatible  but  the  input  Grobner  basis  is  computed  using  a  degee 
compatible  ordering,  then  the  input  ordering  is  used  for  enumeration.  Otherwise,  a  degree 
ordering  (with  input  order  to  break  ties)  is  used.  We  call  this  order  <e.  So  the  strategy  for 
incrementing  T  is  as  follows: 

Start  with  an  empty  T  and  increment  it  by  the  terms  in  an  ascending  <e  order. 

Assume  that  Tg  is  the  set  of  aU  terms  which  are  smaller,  w.r.t.  <e,  than  a  term  s.  Let  the 
largest  term,  w.r.t.  <e,  which  appears  in  a  reduced  Grobner  basis  with  respect  to  the  target 
ordering  be  q.  Since  T,  is  finite,  it  is  easy  to  see  that  T  will  equal  Tg  within  a  finite  number  of 
steps.  This  guarantees  that  the  general  algorithm  wiU  terminate. 

Because  of  the  relaxation  of  this  requirement  about  an  enumeration  ordering  being  different 
from  a  target  ordering,  when  a  new  term  is  enumerated,  it  is  not  necessarily  greater  in  the 
target  ordering  than  previously  enumerated  terms.  So  when  we  solve  for  reducible  enumerated 
terms,  we  may  get  more  than  one  polynomials  to  be  potential  candidates  for  being  elements 
of  a  Grobner  basis  with  respect  to  the  target  ordering. 

Termination  test 

Two  tests  must  succeed  for  the  algorithm  to  terminate.  The  basis  B  generated  by  the 
general  algorithm  must  generate  the  same  ideal  as  the  input  basis,  and  B  should  be  a  Grobner 
basis  with  respect  to  the  target  ordering.  One  way  to  perform  the  first  test  is  to  check  whether 
aU  polynomials  in  the  input  Grobner  basis  reduce  to  0  using  B\  this  would  ensure  that  B 
generates  the  same  ideal  as  the  input  Grobner  basis  since  every  element  in  B  is  in  the  ideal  of 
the  input  Grobner  basis. 

For  the  second  test,  one  possibility  is  to  check  whether  all  nontrivial  S-polynomials  (critical 
pairs)  of  B  reduce  to  0.  This  test  can  be  performed  incrementally  making  use  of  the  information 
from  previous  unsuccessful  attempts  of  the  test. 

There  are  perhaps  other  ways  of  testing  whether  a  Grobner  basis  of  the  same  ideal  has 
already  been  generated.  One  possibility  is  to  use  the  Hilbert  function  of  an  ideal  using  its 
Grobner  basis  given  as  the  input.  The  main  idea  is  to  count  the  number  of  irreducible  terms 
for  every  degree,  and  then  use  this  information  to  decide  whether  a  Grobner  basis  with  respect 
to  the  target  ordering  has  been  generated  or  not.  This  approach  has  not  yet  been  fully  studied 
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in  this  framework,  but  it  may  have  interesting  applications.  We  are  particularly  concerned 
about  the  difficulty  in  computing  a  Hilbert  function  of  positive  dimensional  ideals. 

In  the  next  section,  we  present  a  more  efficient  variant  of  this  algorithm.  In  another  section, 
we  present  heuristics  using  which  this  algorithm  can  be  made  faster. 


4  Basis  conversion  algorithm 

The  following  algorithm,  BasisConvert,  generates  a  Grobner  basis  G2  w.r.t.  a  target  ordering 
<2,  given  a  Grobner  basis  Gi  w.r.t.  a  term  order  <1  The  Solve  algorithm  has  been  slightly 
changed  in  this  section  for  efficiency  considerations. 

BasisConvert(Gi) 

Input :  A  Grobner  basis  Gi  w.r.t.  <1. 

Output  :  A  Grobner  basis,  G2  w.r.t.  <2  of  the  ideal  generated  by  G\. 

Algorithm  : 

begin 
G2  :=  <!>', 

InitialTerms  4>] 

ReducedTerms  := 
t:=  1; 

While  not(TerminationTest{Gi,G2))  do 
t  :=  N extTerm{t,  InitialTerms); 

ReducedTerms  ReducedTerms  U  {t}; 

G'  :=  Solve{Gi,  ReducedTerms); 
if  G'j^<p  then 
G2  “  G2  U  G'; 

InitialTerms  InitialTerms  U  H eadT erms{G'); 

ReducedTerms  :=  RemoveMultiples{ReducedTerms,HeadTerms{G')); 

fi; 

od; 

Return(G2); 

end; 


The  input /output  specifications  of  some  of  the  procedures  used  in  the  above  algorithm  are 
given  below.  Procedure  Solve  is  described  fully.  The  enumeration  order,  <e  is  defined  to  be 
<2  if  <2  is  degree  compatible,  <1  if  <1  is  degree  compatible,  else  it  is  defined  as  follows: 

a  <e  b  if  degree{a)  <  degree(b)  or(degree(a)  =  degree(b)  and  a  <1  b) 

NextTerin(t,  S) 

Input  :  A  term  t  and  a  set  of  terms  5. 

Output  :  The  smallest  term  larger  than  t  w.r.t.  <e  and  not  divisible  by  any  term  in  the 
set  S. 
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HeadTerms(5') 

Input  :  A  set  of  polynomials  S. 

Output  :  The  set  of  headterms  of  aU  polynomials  in  S  w.r.t.  <2. 

RemoveMultiples(T,  S) 

Input :  A  set  of  terms  T  and  another  set  of  terms  S. 

Output  :  The  set  T  -  multiples{S)  where  multiples{S)  is  the  set  of  all  multiples  of  the 
terms  in  S. 

Solve(Gi,r) 

Input  :  A  set  of  terms  T  and  any  term  order  <. 

Output  :  The  set  B  of  aU  polynomials  p  which  satisfy  the  following  properties: 

1.  p  is  in  the  ideal  generated  by  Gi, 

2.  p  is  a  linear  combination  of  the  elements  of  T  and 

3.  There  is  no  polynomial,  p'  which  is  in  the  ideal  generated  by  Gi  and  is  a  linear  combi¬ 
nation  of  terms  in  T,  but  HeadTerm{p')  divides  some  term  of  p. 

Algorithm  : 

begin 
B  1=4 
to  :=  1 
i  :=  1 

while  there  exists  a  term  larger  than  in  T  w.r.t.  <  do 
ti  :=  the  smallest  term  larger  than  in  T  (w.r.t.  <) 
solve  the  equation  AfT<^(ti)  +  =  0- 

if  solution  exists 
then 

let  XjS  be  the  solutions 

5:=5U{t,  +  Ej=r'M} 

T  :=  RemoveMultiples{T,  ti) 

else 

*  :=  i  -|- 1 
fi 
od 

return(5) 

end. 


TerminationTest(G'i ,  G2) 

Input  :  A  Grobner  basis  w.r.t.  <1  viz.  Gi  and  the  set  of  polynomials  G2. 

Output  :  True  if  G<2  is  a  Grobner  basis  of  the  ideal  generated  by  G<j  w.r.t.  <2,  else 
False. 
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Algorithm  :  Test  if  every  polynomial  in  G<j  normalizes  to  zero  w.r.t.  don’t 

then  return  false  else  test  if  Gc^  is  a  Grobner  basis.  More  details  and  heuristics  for  this 
procedure  will  be  outlined  in  section  6. 

5  Proof  of  correctness  of  BasisConvert 

5.1  Proof  of  correctness  of  Solve 

For  a  fixed  T  and  an  ordering  <2,  the  set  of  aU  polynomials  which  satisfy  all  the  conditions 
in  the  output  of  Solve  is  unique  (call  it  Pt,<2)-  Pt,<2  =  {PiyP2, -^Pn}-  Also  let  this  be 

an  ordered  set,  i.e.,  ht<^(j)i)  <2  ht<^{pj)  if  i  <  j.  The  following  lemma  on  Pt,<2  proves  the 
correctness  of  procedure  Solve: 

Theorem  5.1  For  a  given  T  and  an  ordering  <,  Solve{Gi,T)  produces  all  the  polynomials 
in  Pt,<2>  order. 

Proof:  We  use  mathematical  induction  on  i  (where  pi  is  the  polynomial  in  Pt,<2)  to 
prove  this  lemma. 

Base  case  Let  qi  be  the  first  polynomial  that  Solve  produces.  qi  satisfies  the  first  re¬ 
quirement  in  the  output  of  Solve  because  of  the  fact  that  the  linear  equation  which  resulted 
in  qi  was  solvable  implies  that  qi  is  in  the  ideal  generated  by  Gi.  The  second  condition  is  also 
satisfied  since  the  linear  equation  which  resulted  in  qi  was  only  in  terms  of  the  normal-forms 
of  the  terms  in  T.  The  third  condition  is  satisfied  because  in  the  procedure  Solve  we  are 
enumerating  terms  from  T  in  <2  order.  Hence  if  there  was  any  polynomial  smaller  than  qi 
which  was  in  the  ideal  and  could  be  formed  using  terms  in  T,  it’s  corresponding  linear  equation 
would  have  been  solvable  so,  we  would  already  have  it  and  qi  wouldn’t  be  the  first  one.  For 
any  polynomial’s  head  term  to  divide  some  term  in  qi ,  that  polynomial  must  be  smaller  than 
qi.  Since  we  don’t  have  any  polynomial  with  smaller  headterm  w.r.t.  <,  the  third  condition 
is  also  satisfied.  Hence  is  the  smallest  polynomial  which  satisfies  all  the  three  conditions  in 
the  output  specification  of  Solve  so  q\  =  p\. 

Induction  hypothesis  Let  us  assume  that  the  first  i  polynomials  produced  by  the  algo¬ 
rithm  are  pi-  •  -  Pi. 

We  now  prove  that  the  {i  -1-  1)*^  polynomial  produced  by  Solve  (^j+i)  is  the  same  as  the 
(i  -I- polynomial  of  Pt,<2  i-e->  Pi+i-  This  can  be  proven  in  the  same  way  as  the  base  case 
by  replacing  T  by  Ti  and  Pt,<2  by  PTi,<2  in  the  proof,  where 

Ti-T-  {multiples{ht<.^{p\)),  multipleslht^^iP^))^  “•■>  multiples(ht<^^(pi))} 
and  realizing  the  simple  fact  that 


PTi,<2  =  {Pi+l^Pi+2,  •  ■  -iPn} 


□ 


This  concludes  the  proof  of  correctness  of  Solve. 
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5.2  Proof  of  correctness  of  BasisConvert 

We  will  now  prove  that  within  a  finite  number  of  steps,  BasisConvert  will  return  a  Grobner 
basis  of  the  ideal  w.r.t.  the  target  ordering.  We  prove  a  lemma  which  establishes  that  within 
a  finite  number  of  steps,  T  will  contain  all  the  terms  of  the  reduced  Grobner  basis  and  then 
the  Grobner  basis  will  be  computed  by  invoking  Solve  at  this  point. 

Lemma  5.2  Let  s  be  a  term.  In  a  finite  number  of  steps  of  the  algorithm,  all  the  terms  which 
are  smaller  than  s  w.r.t.  <e  will  either  be  in  the  set  ReducedT erms  or  be  divisible  by  some 
term  in  the  set  InitialTerms. 

Proof :  We  say  that  a  term  is  being  considered  by  the  procedure  NextTerm  when  it  is 
either  produced  by  NextTerm  or  it  is  skipped  by  NextTerm.  Notice  that  a  term  is  said  to 
have  been  produced  by  NextTerm  if  it  is  actually  returned  by  the  procedure. 

Since  the  enumeration  is  being  done  in  <e  order  which  is  degree  compatible,  s  will  be 
considered  in  a  finite  number  of  steps.  At  that  point,  all  the  terms  smaller  that  s  would 
already  have  been  considered  by  NextTerm.  For  each  smaller  term  s,-  (all  of  which  were 
considered),  we  can  do  the  following  analysis: 

1.  Si  was  not  produced.  Since  it  was  not  produced,  it  must  be  divisible  by  some  term  in 
InitialTerms,  and  wiU  continue  to  be  so  since  nothing  is  ever  removed  from  InitialTerms. 

2.  Si  was  produced,  hence  one  of  the  following  wiU  be  true: 

(a)  Si  was  put  in  the  set  InitialTerms  (and  then  will  continue  to  be  in  that  set)  or 

(b)  Si  was  put  in  ReducedT erms.  Hence,  one  of  the  following  will  be  true: 

i.  Si  will  continue  to  be  in  ReducedT  erms,  or 

ii.  Si  will  become  divisible  by  some  term  in  InitialTerms  at  a  later  point  of  time, 
and  be  removed  from  ReducedT  erms. 

Hence,  when  s  is  being  considered,  aU  terms  smaller  than  s  would  either  be  in  ReducedT  erms 
or  be  divisible  by  some  element  of  InitialTerms.  □ 

We  wiU  denote  the  set  of  aU  terms  smaUer  than  or  equal  to  a  term  s  w.r.t.  <e  to  be  Tg. 
Also  jRG<  denotes  the  reduced  Grobner  basis  of  an  ideal  w.r.t.  the  ordering  <.  If  5  is  a  set 
of  polynomials,  then  tail(S)  is  the  difference  of  the  set  of  aU  terms  in  aU  the  polynomails  of  S 
and  the  set  of  head  terms  of  aU  polynomials  in  S.. 

Lemma  5.3  Let  T{RG<2)  ^  ^he  set  of  all  terms  in  the  reduced  Grobner  basis  of  the  ideal 
generated  by  the  original  Grobner  basis  Gi  w.r.t.  the  target  term  ordering.  In  a  finite  number 
of  steps  of  the  algorithm,  all  the  following  become  true  together: 

1.  T{RG^,^)  C  ReducedT  erms  U  InitialTerms 

2.  tail(RG<^2)  ^  ReducedT  erms 

8.  WpeRG^^ ,  ht(p)eInitialTerms  =J>  3q€G2ht{p)  =  ht{q) 
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4.  \/peRG<2  5  ht{p)eReducedTerms  =>  peSolve{Gi,  ReducedTerms) 


Proof :  From  the  properties  of  Solve,  we  can  assume  that  at  all  the  polynomials  in  G2  are 
from  the  ideal  generated  by  Gi. 

We  first  prove  that  within  a  finite  number  of  steps,  the  first  condition  will  be  satisfied. 
Then  we  prove  that  the  first  condition  implies  all  the  remaining  conditions,  hence  proving  the 
lemma. 

1.  First  condition  becomes  true  in  a  finite  number  of  steps;  Let  the  largest  term  in  T^RG^^) 
w.r.t.  <e  be  s.  Then  by  previous  lemma,  within  a  finite  number  of  steps,  every  term 
of  Tsi^  T{RG<^))  will  either  be  in  ReducedTerms  or  be  divisible  by  some  term  in 
InitialTerms.  But  all  the  terms  in  InitialTerms  are  head  terms  of  some  polynomial  in 
the  ideal.  Since  there  is  no  term  t  e  T{RG<,^)  which  can  be  divisible  by  the  head  term 
(say  h)  of  another  polynomial  in  the  ideal  unless  =  t,  all  the  terms  of  T{RG<^)  must 
either  be  in  ReducedTerms  or  be  in  InitialTerms  (as  opposed  to  simply  being  divisible 
by  some  term  in  InitialTerms). 

2.  First  condition  implies  the  second  condition:  Previous  condition  implies  that  all  the 
terms  in  the  tail  of  every  polynomial  in  RG^^  inust  either  be  in  ReducedTerms  or  be 
in  InitialTerms.  But  any  term  in  the  tail  of  a  polynomial  in  the  reduced  Grobner  basis 
can  never  be  the  head  term  of  another  polynomial  in  the  ideal  (since  this  contradicts 
the  fact  that  RG^^  is  reduced).  Hence  all  the  terms  in  the  tail  of  all  the  polynomials  of 
RG<^2  J^ust  be  in  the  set  ReducedTerms. 

3.  Third  condition  is  always  true:  The  third  condition  is  an  invariant  of  the  main  loop  of 
the  algorithm.  If  any  term  is  in  InitialTerms,  it  is  clear  from  the  algorithm  that  it  must 
be  the  case  that  there  is  a  polynomial  in  G2  whose  head  term  is  the  same  as  this  term 
(since  we  only  put  head  terms  in  InitialTerms). 

4.  Previous  conditions  imply  the  fourth  condition:  If  the  head  term  of  any  polynomial  in 
i?G<2  is  in  ReducedTerms,  then,  since  the  tail  of  this  polynomial  must  be  in  ReducedTerms 
(by  the  second  condition),  this  polynomial  would  satisfy  the  second  condition  in  the  out¬ 
put  of  Solve.  It  will  also  satisfy  the  first  condition  of  the  output  of  Solve  since  it  is 
obviously  in  the  ideal.  Moreover,  since  this  polynomial  is  a  member  of  the  reduced 
Grobner  basis,  there  is  no  polynomial  at  all  in  the  ideal  whose  head  term  divides  any 
term  in  this  polynomial  (unless  it  is  larger  than  this  polynomial  w.r.t.  <2)?  it  ^-Iso  satis¬ 
fies  the  third  condition  of  the  output  of  Solve.  Since  it  satisfies  all  the  three  conditions  of 
the  output  of  Solve,  it  must  be  produced  by  Solve  when  invoked  with  ReducedTerms. 

□ 


Theorem  5.4  In  a  finite  number  of  steps,  G2  would  be  a  Grobner  basis  for  the  ideal  generated 
by  Gi. 

Proof :  By  the  previous  coroUary,  after  a  finite  number  of  steps,  the  head  terms  of  all 
polynomials  in  the  reduced  Grobner  basis  are  either  in  ReducedTerms,  and  then  by  the  fourth 
condition  of  the  previous  coroUary  they  wiU  be  produced  (and  hence  be  put  in  G2),  or  in 
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InitialTerms,  in  which  case,  by  the  third  of  the  previous  corollary,  there  already  exists  a 
polynomial  in  G2  which  has  the  same  head  term  as  this  one.  So  for  all  polynomials  in  i?G<2, 
there  exists  a  polynomial  in  G2  which  has  the  the  same  head  term  as  this  one.  Hence  by 
Proposition  5.38  of  [2],  G2  must  be  a  Grobner  basis.  □ 

This  last  theorem  proves  the  correctness  of  BasisConvert.  Notice  that  BasisConvert  does 
not  necessarily  produce  the  reduced  Grobner  basis  w.r.t.  <2.  AU  that  we  have  proved  is  that 
it  produces  a  Grobner  basis.  The  reduced  Grobner  basis  can  easily  be  obtained  from  this 
Grobner  basis  by  deleting  all  polynomial  whose  head  terms  are  multiples  of  the  head  terms  of 
other  polynomials  and  then  interreducing  them. 


6  Optimizations 

The  most  time-consuming  steps  in  the  algorithm  are: 

•  normal  form  computations  of  terms, 

•  solving  linear  equations  to  generate  polynomials  expressed  using  (possibly)  reduced  terms 
with  respect  to  G2  in  the  ideal, 

•  testing  for  ideal  equality,  and 

•  testing  for  the  output  to  be  a  Grobner  basis. 

6.1  Generating  normal  forms  of  terms 

Computing  the  normal  form  of  a  term  t  is  done  in  two  steps. 

1.  Generate  a  polynomial  (which  we  call  a  subnormal  form)  whose  normal  form  is  the  same 
as  that  of  t.  An  objective  here  is  to  use  normal  forms  already  computed  and  saved  in  a 
hash-table. 

2.  Compute  the  normal  form  of  a  subnormal  form. 

•  Generating  a  subnormal  form.  Since  terms  are  being  generated  in  a  degree  compati¬ 

ble  order  by  the  procedure  NextTerm,  when  a  term  t  is  generated,  the  set  ReducedTerms 
contains  terms,  all  of  which  are  smaller  than  t  w.r.t.  <e.  For  every  variable  Xi  which  di¬ 
vides  t,  it  must  be  the  case  that  the  term  —  e  ReducedTerms  (as  otherwise  ,  there  must 
exist  a  variable  Xk  such  that  ^  is  divisible  by  some  term  in  InitialTerms,  and  hence,  so 
is  t  so  t  should  not  have  been  generated  anyway).  Let  Xs  be  the  smallest  variable  which 
divides  the  term  t.  The  normal  form  of  t,  is  computed  as  follows: 

MR{t)  =  Normalize{xs  x  NT{—)) 

X  g 

The  polynomial  Xg  x  is  called  a  subnormal  form  of  t. 
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•  Computing  the  normal  form  of  a  subnormal  form.  The  procedure  Normalize 
is  invoked  on  a  polynomial  such  that  all  the  terms  in  it  are  smaller  than  t  w.r.t.  <i. 
Since  the  summation  of  the  normal  forms  of  terms  gives  the  normal  form  of  the  v/hole 
polynomial,  some  of  the  terms  of  this  polynomial  which  are  smaller  than  t  w.r.t  <e  can 
be  replaced  by  their  normal  forms,  which  have  already  been  computed  and  stored.  It 
turns  out  that  in  practice,  if  the  normal  forms  of  the  terms  not  smaller  than  t  w.r.t  <e 
are  also  computed  and  stored,  they  can  be  used  in  subsequent  computations  making  the 
normalization  process  more  efficient.  If  <i  coincides  with  the  enumeration  term  order 
<e,  i.e.  <1  is  degree  compatible,  then  we  wiU  have  normal  forms  of  all  the  terms  of 
the  polynomial  which  are  not  divisible  by  any  term  in  InitialTerms,  and  this  further 
simplifies  the  computation  of  normal  forms. 

6.2  Solving  Linear  Equations 

Instead  of  invoking  Solve  after  every  term  is  generated,  it  is  more  efficient  to  invoke  only  after 
aU  the  terms  of  a  certain  degree  have  been  generated.  For  every  degree,  we  invoke  Solve,  and 
if  new  polynomials  are  generated  to  be  included  in  the  new  basis,  normalize  G\  w.r.t.  the  new 
basis  collected  till  that  point,  and  if  all  the  polynomials  in  G\  reduce  to  zero,  test  the  new 
basis  for  a  Grobner  basis.  For  solving  a  system  of  equations,  the  optimization  used  for  the 
zero-dimensional  case  ([1])  can  be  used. 

6.3  Normalizing  Gi  and  testing  G2  for  Grobner  basis 

•  Partial  normalization  of  Gi.  As  should  be  obvious,  the  test  for  checking  whether 
every  polynomial  in  Gi  reduces  to  0  using  the  new  basis  is  done  incrementally  when  new 
polynomials  are  added  to  the  new  basis.  If  during  this  test,  a  polynomial  in  G\  does  not 
reduce  to  0,  false  is  returned.  Next  time  the  same  test  is  invoked,  there  is  no  need  to 
check  again  the  polynomials  in  Gi  which  earlier  reduced  to  0  since  polynomials  are  not 
being  deleted  from  the  new  basis.  The  test  can  resume  from  a  normal  form  of  the  last 
polynomial  in  Gi  that  did  not  reduce  to  0. 

•  Testing  Gi  for  Grobner  Basis.  This  test  is  also  done  incrementally.  Firstly  we  need 
to  consider  only  those  polynomials  in  the  new  basis  whose  head-terms  do  not  reduce 
by  the  other  polynomials  in  the  new  basis.  For  any  distinct  pair  of  such  polynomials 
in  the  new  basis,  if  the  associated  S-polynomial  (critical  pair)  does  not  reduce  to  0, 
false  is  returned.  Next  time  this  test  is  invoked,  we  can  resume  from  this  S-polynomial 
itself  (assuming  the  head  term  of  any  of  the  polynomials  in  the  new  basis  from  which 
the  S-polynomial  was  generated  does  not  reduce  by  any  new  polynomial  added  to  the 
new  basis).  Finally,  the  criteria  (like  Buchberger’s  two  criteria  [4])  which  minimize  the 
number  of  S-polynomials  needed  to  be  reduced  in  the  classical  algorithm  can  be  used 
to  optimize  the  number  of  S-polynomials  which  must  be  considered  for  a  Grobner  basis 
test. 

Because  of  these  optimizations,  polynomials  in  Gi  are  reduced  exactly  once. 
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7  Examples 

Below  we  illustrate  the  main  steps  of  the  general  algorithm  on  a  couple  of  very  simple  examples. 

Example  1;  Consider  the  following  Grobner  basis  of  a  one  dimensional  ideal  w.r.t.  the  total 
degree  order  with  variables  x  >  y  >  z, 

Gi  =  {yz  +  x,xy  +  z,  -  z^}. 

The  general  algorithm  can  be  used  to  transform  Gi  into  G2  where  <2  is  the  pure  lexico¬ 
graphic  order  with  x  >  y  >  z.  In  the  following  trace,  we  present  the  snapshots  of  the  values 
of  of  the  sets  ReducedTerms,  InitialTerms,  G2  after  all  the  terms  of  a  certain  degree  have 
been  generated  and  the  call  to  Solve  has  been  made.  Finally  it  is  indicated  how  much  of  Gi 
has  been  reduced  by  the  current  G2.  In  the  last  case,  we  start  with  Gi  at  degree  1  and  keep 
reducing  at  each  step  until  this  set  reduces  to  <f>  and  then  test  G2  for  Grobner  basis. 

After  all  the  terms  of  degree  one  have  been  produced 

ReducedTerms  =  {2:,  y,  x}, 

InitialTerms  = 

G2  =  <f>i 

Gi  =  {yz  +  x,xy  +  z,  x^  -  z^{  reduced  to  {yz  -|-  x, xj/  -|-  z,  x^  -  z^}. 

After  all  the  terms  of  degree  two  have  been  produced 

ReducedTerms  =  {z,  y,  z^,  yz,  y^}, 

InitialTerms  =  {x}, 

G2  =  {x  -H  yz}, 

Gi  =  {yz  -b  X,  xy  -I-  z,  x^  -  z^}  reduced  to  {y^z  —  z,  x^  —  z^}. 

After  all  the  terms  of  degree  three  have  been  produced 

ReducedTerms  =  {z,  y,  z^,  yz,  y^,  z^,  yz^,  y^z,  y®}, 

InitialTerms  =  {x,y^z}, 

G2  =  {x-\-yz,y^z-  z},, 

{y^z  -  z,x^  -  z^}  reduced  to  (j>. 

Since,  at  this  point,  every  polynomial  from  the  original  Grobner  basis  was  reduced  to  zero,  G2 
is  tested  for  a  Grobner  basis.  The  test  succeeds.  Hence  G2  is  the  Grobner  basis  of  the  ideal 
generated  by  Gi  w.r.t.  the  pure  lexicographic  order  <2. 

Example  2:  Consider  another  Grobner  basis  of  a  one  dimensional  ideal  w.r.t.  the  total  degree 
order  with  variables  x  >  y  >  z: 
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Gi  =  {xy  —  x,x‘^  +  xz,  y^z  +  a;}. 

The  goal  is  to  transform  G\  into  G2  where  <2  is  the  pure  lexicographic  order  with  x>  y>  z. 
Notice  that  in  this  example,  when  the  polynomials  in  the  original  Grobner  basis  Gi  reduce 
to  0,  G2  is  stiU  not  a  Grobner  basis.  One  more  step  is  needed.  Also,  after  degree  2,  some 
polynomials  are  obtained,  which  belong  to  the  ideal,  but  later  other  polynomials  are  produced 
whose  headterms  divide  the  headterms  of  these  polynomials. 

After  all  the  terms  of  degree  one  have  been  produced 

ReducedT  erms  =  y,  *}, 

InitialTerms  =  <l>, 

G2  =  4>, 

Gi  =  {xy  —  x,x^  +  xz,  y^z  +  x}  reduced  to  {xy  -  x,  x^  +  xz,  y^z  +  x}. 

After  all  the  terms  of  degree  two  have  been  produced 

ReducedTerms  =  {z,  y,  x,  z^,  yz,  y^,  xz}, 

InitialTerms  =  {xy,  x^}, 

G2  =  {xy  -  x,x^  +  xz], 

Gi  =  {xy  —  X, x^  +  xz,  y^z  +  x}  reduced  to  {y^z  +  x}. 

After  all  the  terms  of  degree  three  have  been  produced 

ReducedTerms  =  {z,  y,  z^,  yz,  y^,  z^,  yz"^,  y^z,  y^}, 

InitialTerms  =  {x}, 

G2  =  {x  +  y^0,  xy  -  X,  x^  +  xz), 

{y^z  +  x}  reduced  to  (f). 

At  this  point,  since  all  polynomials  in  Gi  reduce  to  0,  G2  must  be  tested  for  a  Grobner  basis. 
This  test  fails  because  the  S-polynomial  of  {x  +  y^2,  xy  —  x}  does  not  reduce  to  0.  So  we 
enumerate  terms  of  degree  4. 

After  all  the  terms  of  degree  four  have  been  produced 

ReducedTerms  =  {z,  y,  z^ ,  yz,  y^,  z^,  yz^,  y^z,  y^,  yz^,  y'^z^,  y^}, 

InitialTerms  =  {x,y^z}, 

G2  =  {x  +  y^z,  y^z  -  y^z,  xy  -  x,  x^  +  xz}. 

Again  the  test  for  a  Grobner  basis  is  performed  and  this  time  it  succeeds.  Hence  G2  is  a 
Grobner  basis  for  the  ideal  generated  by  Gi.  The  Grobner  basis  from  G2  is  obtained  by  deleting 
polynomials  whose  head  terms  can  be  reduced  by  the  head  terms  of  other  polynomials. 
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8  Empirical  results 

The  general  algorithm  has  been  implemented  in  GEOMETER  system  [3].  The  heuristics 
outlined  in  section  6  have  been  tried.  A  degree  ordering  was  used  for  enumeration  of  terms. 
Polynomials  in  the  input  Grobner  basis  were  incrementally  reduced  to  0  to  ensure  that  the 
output  generated  the  same  ideal.  At  the  end,  a  Grobner  basis  test  was  incrementally  run 
in  which  S-polynomials  are  reduced  to  0.  The  performance  of  the  general  basis  conversion 
algorithm  is  compared  with  the  classical  Buchberger’s  Algorithm  using  the  normal  selection 
strategy  [4].  Some  of  the  examples  tried  for  comparisons  were  given  below.  The  polynomials 
for  each  set  are  written  in  the  infix  notation  and  a  Grobner  ba.sis  was  computed  w.r.t.  the 
term  order  A>B>C>D>E>F. 

1.  (+  A  B  C  D  E  F) 

(+  (*  A  B)  (*  B  C)  (*  CD)  (*  D  E)) 

(+  (*  A  B  C)  (*  B  C  D)  (*  C  D  E)) 

(+  (*  A  B  C  D)  (*  C  B  D  E)) 

(+  (*  A  B  C  D  E)  -1) 

2.  (+  A  B  C  D  E  F) 

(+  (*  A  B)  (*  B  C)  (*  CD)  (*  D  E)) 

(+  (*  A  B  C)  (*  B  C  D)  (*  C  D  E)) 

(+  (*  A  B  C  D)  (*  B  C  D  E)) 

(+  (*  A  B  C  D  E  F)  -1) 

3.  (+  A  B  C  D  E  F) 

(+  (*  A  B)  (*  B  C)  (*  CD)  (*  D  E)  (*  E  F)) 

(+  (*  A  B  C)  (*  B  C  D)  (*  C  D  E)) 

(+  (*  A  B  C  D)  (*  B  C  D  E)) 

(+  (*  A  B  C  D  E  F)  -1) 

4.  (+  A  B  C  D  E  F  -1) 

(+  (*  A  B)  (*  B  C)  (*  CD)  (*  D  E)  -1) 

(+  (*  A  B  C)  (*  B  C  D)  (*  C  D  E)) 

(+  (*  A  B  C  D)  (*  B  C  D  E)) 

(+  (*  A  B  C  D  E  F)  -1) 

In  Figure  1  above,  timings  are  given  for  computing  a  degree  basis  on  GeoMeter  (a),  con¬ 
verting  to  a  lexicographic  basis  from  the  degree  basis  on  GeoMeter  (b),  total  time  taken  to 
obtain  the  lexicographic  basis  by  conversion  (c)  (=  a  -h  b),  obtaining  the  lexicographic  basis 
directly  on  GeoMeter  (d)  and  finally,  for  obtaining  the  lexicographic  basis  directly  on  Maple 
(e).  A  star  (*)  entry  indicates  that  either  the  program  ran  out  of  space  or  it  ran  for  more  than 
24  hours. 

It  was  found  that  for  most  nontrivial  examples,  BasisConvert  performed  better  than  the 
classical  algorithm  as  implemented  in  GEOMETER  and  MAPLE.  Comparisons  with  other 
systems  are  difficult  to  make  because  their  implementations  and  handling  of  data  structures 
could  possibly  be  totally  different. 

There  is  considerable  room  for  improving  the  performance  of  the  BasisConvert  algorithm. 
We  earlier  mentioned  issues  related  to  enumeration  orderings  as  well  as  tests  for  termination. 
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Example 
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811 
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35 
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368 

19213 

19581 

* 

* 

Figure  1:  Comparison  of  BasisConvert  with  direct  computation  of  Grobner  basis 

Further,  currently  we  are  using  a  naive  Gaussian  elimination  procedure  in  Solve  using  the 
kcl  infinite  precision  rational  arithmetic.  A  better  implementation  of  a  Gaussian  elimination 
method  is  likely  to  result  in  much  better  timings. 


References 

[1]  J.C.  Faugere,  P.  Gianni,  D.  hazard  and  T.  Mora  Efficient  computation  of  zero-dimensional 
Grobner  bases  by  change  of  ordering.  To  appear  in:  J.  Symb.  Comp.  1990. 

[2]  T.  Becker,  V.  Weispfenning  in  cooperation  with  H.  Kredel  Grobner  Bases  -  A  Computa¬ 
tional  Approach  to  Commutative  Algebra.  Springer  Verlag  1993. 

[3]  C.I.  Connolly,  D.  Kapur,  J.L.  Mundy  and  R.  Weiss  GeoMeter:  A  System  for  Modeling 
and  Algebraic  Manipulation.  Proc.  of  DARPA  Workshop  on  Image  Understanding.  Palo 
Alto,  CA,  pp  797-804  1989. 

[4]  B.  Buchberger,  Grobner  bases:  An  algorithmic  method  in  polynomial  ideal  theory.  In: 
(ed.  N.K.  Bose)  Multidimensional  System  Theory,  D.  Reidel  Publishing  Co,  1985, 184-232. 


17 


Sparsity  Considerations  in  Dixon  Resultants* 


Deepak  Kapur  and  Tushar  Saxena 
Institute  for  Programming  and  Logics 
Department  of  Computer  Science 
State  University  of  New  York  at  Albany 
Albany,  NY  12222 

April  26,  1995 


Abstract 

New  results  relating  the  sparsity  of  non-homogeneous  polynomial  systems  and  computation 
of  their  projection  operator  (i.e.,  a  non-trivial  multiple  of  multivariate  resultant)  using  Dixon  s 
method  are  developed.  It  is  proved  that  the  Dixon  formulation  of  resultants,  despite  being 
classical,  implicitly  exploits  the  structure  of  the  Newton  polytopes  of  input  polynomials;  the 
complexity  of  computing  Dixon  resultant  is  not  determined  by  the  total  degree  of  the  polynomial 
system.  Bound  on  the  size  of  the  Dixon  matrix  of  unmixed  polynomial  systems  is  derived  in  terms 
of  their  Newton  poly  topes.  This  bound  is  used  to  prove  that  for  a  multi- homogeneous  system,  the 
size  of  its  Dixon  matrix  is  of  a  smaller  order  than  its  n-fold  mixed  volume.  Using  dense  multivariate 
polynomial  interpolation  techniques,  it  is  shown  that  for  a  fixed  number  of  variables,  Dixon  matrix 
of  multi-homogeneous  polynomial  systems  can  be  constructed  using  0{M^)  arithmetic  operations, 
where  M  is  the  n-fold  mixed  volume  of  the  input  system.  The  Dixon  matrix  is  found  to  be 
smaller  than  the  sparse  and  Macaulay  resultant  matrices  by  a  factor  exponential  in  the  number 
of  variables,  and  the  Dixon  formulation  yields  a  faster  algorithm  to  compute  the  resultant.  This 
work  links  the  classical  Dixon  formulation  (developed  in  1908)  to  the  modern  line  of  sparsity 
analysis  based  on  Newton  polytopes. 


1  Introduction 

Eliminating  n  variables  from  n-l-l  multivariate  polynomials  results  in  the  resultant  —  a  quantity  which 
has  proved  useful  in  many  applications  [8].  Most  efficient  ways  to  compute  the  resultant  express  it  as 
a  determinant.  However,  a  pure  determinantal  expression  for  the  resultant  is  rare  (see  [13]  for  some 
classes  of  polynomials  which  allow  a  pure  determinantal  formulation),  and  in  these  cases,  one  can 
still  express  some  nontrivial  multiple  of  the  resultant  (called  a  projection  operator')  as  a  determinant. 

There  are  three  major  methods  to  compute  the  resultant  or  a  projection  operator  -  the  classical 
formulations  by  Dixon  [4]  and  Macaulay  [12]  developed  early  this  century,  and  the  recently  developed 
sparse  resultant  formulation  [11,  6].  All  three  methods  construct  matrices  whose  determinant  is  either 
the  resultant,  or  a  projection  operator. 

The  Macaulay  formulation  uses  the  traditional  Bezout  bound  on  the  number  of  solutions  of  a 
polynomial  system  to  construct  the  Macaulay  matrix.  The  size  of  this  matrix  depends  on  the  total 
degree  of  the  input  polynomials.  However,  recently,  sharper  bounds  have  been  derived  on  the  number 
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of  solutions,  known  as  BKK  bounds  [1],  which  exploit  the  structure  and  sparsity  of  the  polynomials 
as  determined  by  their  Newton  polytopes.  The  sparse  resultant  formulation  is  a  modification  of  the 
Macaulay  formulation  which  exploits  the  modern  BKK  bound  and  Newton  polytopes  of  the  input 
polynomials  to  construct  smaller  matrices  [7].  The  size  of  the  sparse  resultant  matrix  depends  on 
the  volume  of  the  polytopes  of  input  polynomials,  and  is  not  governed  by  their  total  degree.  Size  of 
sparse  resultant  matrix  is  small  because  it  does  not  assume  any  monomials,  besides  the  ones  inside 
the  Newton  polytope,  have  non-zero  coefficients  -  unlike  the  Macaulay  matrix,  which  is  much  larger 
because  it  is  constructed  under  the  crude  assumption  that  all  monomials  of  total  degree  less  than  or 
equal  to  the  input  polynomial  have  non-zero  coefficients,  without  regard  to  the  more  refined  structure 
of  the  polynomial  in  terms  of  its  Newton  polytope. 

Methods  based  on  the  Dixon  formulation  have  been  empirically  found  to  be  more  efficient  than 
the  Macaulay  and  the  sparse  resultant  formulations  [9,  10].  However,  being  a  classical  approach 
(developed  in  1908),  the  relationship  between  the  Dixon  formulation  and  the  modern  BKK  bounds, 
or  the  structure  of  the  input  polytopes,  is  not  weU  understood.  In  [11],  Sturmfels  gave  the  bracket 
representation  of  the  Dixon  resultant  for  bi- homogeneous  systems;  however  the  Dixon  resultant  was 
not  analyzed  in  its  fuU  generality.  Moreover,  it  has  been  unclear  whether  the  Dixon  formulation 
exploits  the  sparsity  of  input  polynomials,  and  if  so,  to  what  extent  (quantitatively).  Little  has  been 
known  about  the  size  of  the  Dixon  matrix  and  its  relationship  to  the  Newton  polytopes  of  the  input 
polynomials. 

This  paper  makes  three  contributions.  The  first  contribution  is  to  show  that  the  Dixon  for¬ 
mulation  of  multivariate  resultants  implicitly  exploits  the  structure  of  Newton  polytopes  of  input 
polynomials  and  is,  thus,  not  governed  by  their  total  degree  (or  the  classical  Bezout  bound).  Using 
this  relationship,  an  explicit  bound  on  the  size  of  Dixon  matrix  is  derived  in  terms  of  the  volume 
of  polytopes  of  input  polynomials,  and  this  is  the  second  and  main  contribution  of  the  paper.  It  is 
shown  that  the  size  of  the  Dixon  matrices  of  unmixed  systems  is  bounded  by  the  number  of  integral 
points  inside  the  Minkowski  sum  of  successive  projections  of  their  Newton  polytopes  (Theorem  3.4 
and  CoroUary  3.5).  And,  the  third  contribution  of  the  paper  is  an  upper  bound  on  the  complexity  of 
constructing  Dixon  matrices.  This  is  especially  significant  since  entries  in  a  Dixon  matrix  are  more 
complex  in  contrast  to  entries  in  a  Macaulay  matrix  or  a  sparse  resultant  matrix,  which  are  simply 
the  coefficients  of  terms  in  polynomials.  This  bound  on  the  construction  of  Dixon  matrix  shows 
that  despite  more  complicated  entries,  it  can  be  constructed  fast.  These  contributions  are  further 
elaborated  below. 

It  is  shown  in  this  paper  that,  much  like  the  sparse  resultant  formulation,  the  Dixon  formulation 
assumes  that  all  monomials  outside  the  polytopes  have  zero  coefficients.  If  the  polytopes  of  two 
polynomial  systems  have  the  same  volumes  and  projections,  then  no  matter  how  different  their  total 
degrees,  their  Dixon  matrices,  like  their  sparse  resultant  matrices,  would  be  of  the  same  size.  The 
size  of  the  Dixon  matrix  is  much  smaller  than  the  size  of  Macaulay  matrix.  It  is  also  shown  that 
the  size  of  the  Dixon  matrix  is  smaller  than  the  size  of  sparse  resultant  matrix,  which  is  bounded 
by  the  number  of  integral  points  inside  the  Minkowski  sum  of  the  Newton  polytopes  (not  their 
successive  projections)  of  the  input  polynomials.  This  is  analogous  to  the  Dixon  matrix  whose  size 
is  bounded  by  the  number  of  integral  points  inside  the  Minkowski  sum  of  the  successive  projections 
of  Newton  polytopes  of  input  polynomials.  However,  since  the  projections  are  of  successively  lower 
dimension  than  the  polytopes  themselves,  the  Dixon  matrix  is  smaller.  Specifically,  the  size  of  the 
Dixon  matrix  of  multi-homogeneous  polynomials  is  proved  to  be  of  smaller  order  than  their  n-fold 
mixed  volume  (Theorem  3.8),  whereas  the  size  of  the  sparse  resultant  matrix  is  larger  than  the  n-fold 
mixed  volume  by  an  exponential  multiplicative  factor  (Theorem  3.9).  If  the  n-fold  mixed  volume 
of  a  multi-homogeneous  polynomial  system  is  M  and  the  number  of  variables  being  eliminated  is  n 
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then  for  asymptotically  large  number  of  partitions  of  variables  (approaching  n), 

Size  of  its  Dixon  matrix  =  O  ( — r-r-^  )  > 

\n  +  1  y 

Size  of  its  Sparse  Resultant  matrix  =  O  .  (2) 

This  is  actually  congruent  with  the  intuition  Dixon  discussed  in  his  paper,  because  he  mainly  devel¬ 
oped  this  formulation  as  an  efficient  method  for  n-degree  polynomial  systems^,  and  for  asymptotically 
large  number  of  partitions,  multi- homogeneous  systems  approach  the  n-degree  case. 

Regarding  the  construction  of  the  Dixon  matrix,  it  is  shown  that  it  can  be  constructed  fast  using 
a  variant  of  the  dense  multivariate  polynomial  interpolation  algorithm,  which  exploits  the  bounds 
on  the  size  of  the  Dixon  matrix.  Specifically,  for  multi-homogeneous  systems  with  a  fixed  number  of 
variables,  the  Dixon  matrix  can  be  constructed  using  0{M^)  arithmetic  operations  (Theorem  4.1). 

Putting  these  results  together,  the  cost  of  computing  a  projection  operator  using  Dixon  formu¬ 
lation  is  the  sum  of  (i)  the  cost  of  constructing  the  Dixon  matrix  and  (ii)  the  cost  of  computing 
its  determinant.  The  number  of  arithmetic  operations  required  for  (ii)  is  0{M^)  (see  Equation  1). 
Consequently,  for  a  fixed  number  of  variables,  the  computation  of  a  projection  operator  using  the 
Dixon  formulation  takes  O(M^)  arithmetic  operations.  This  is  in  contrast  to  the  sparse  resultant 
formulation  which,  if  the  polynomials  are  fulP,  then  for  a  fixed  number  of  variables,  takes 
arithmetic  operations  [7].  If  the  polynomials  are  hollov?,  the  construction  time  of  the  sparse  resul¬ 
tant  reduces  considerably  and  the  determinant  computation  dominates,  but,  it  still  takes  longer  than 
the  Dixon  formulation  because  the  matrix  size  is  larger  (Equation  2). 

To  the  best  of  our  knowledge,  this  is  the  first  time  such  a  relationship  has  been  shown  between 
the  classical  Dixon  formulation  and  the  modern  line  of  analysis  based  on  Newton  polytopes  of 
polynomials.  The  analysis  based  on  this  relationship  reveals  that  the  Dixon  formulation  does  take 
into  account  the  sparsity  of  input  polynomials,  creates  smaller  resultant  matrices,  and  has  lower 
complexity  than  the  sparse  resultant  formulation. 


2  Dixon  Method  for  Computing  Projection  Operators 

Let  X  =  {xi, . .  .,a;„}  and  X  =  {xi,...,x„}  be  two  sets  of  n  variables  each.  Given  a  polynomial 
P  e  Q[X],  p(xi,...,xfc,  Xk+i,  ...,x„)_ stands  for  uniformly  replacing  Xj  by  Xj  for  1  <  j  <  k  in 
p.  For  a  given  polynomial  g  €  Q[X,X],  Vx{q)  and  Vx(q),  the  monomial  vectors  of  q  wrt.  X 
and  X  respectively,  are  defined  to  be  the  vectors  of  aU  monomials  appearing  in  q  in  the  variables 
X  and  X  respectively.  The  exponent  vector  of  the  monomial  •  •  •x®"xf  •  •  -  x®"  wrt.  the  set 
X  of  variables  is  defined  to  be  the  n-tuple  (d, . .  .,e„)  €  K".  The  support,  supp{q)  C  of  a 
polynomial  q  is  the  set  of  exponent  vectors  of  all  the  monomials,  wrt.  X,  with  non-zero  coefficients 
in  q.  The  determinant  of  a  square  matrix  A  is  denoted  by  |A|. 

Example:  If  X  =  {xi,X2}  and  p  =  2xfx2xf  +  x|xix^  +  xf  -|-  xiX2  +  1  6  then  Vxip)  = 

[xfx2,  x|,  1]  and  Vx{p)  =  [x?,  xix^,  x?,  xiX2,  l].  supp{p)  -  {(3, 1), (0,4), (0,0)}.  □ 


,  Xn  are  called  n-degree  there  exist  nonnegative  integers 


^71  4- 1  nonhomogeneous  polynomials  pi, . . .  ,pn+i  in  xi, 
mi , . . . ,  rrin  such  that  each 

Vi  =  T.IU  ■  •  •  Er=o  . •  •  •  <»  /or  1  <  i  <  n  -HI. 

*Iii  a  full  polynomial,  all  terms  inside  its  Newton  polytope  have  non-zero  coefficients. 

®In  a  hollow  polynomial,  only  the  terms  on  the  boundary  of  its  Newton  polytope  have  non-zero  coefficients. 
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Definition  2.1  Let  P  =  {pi,  •  •  *  ,Pn-\-i}  C  Q[X]  be  a  set  of  n  +  1  polynomials  in  n  variables 
r/ie  cancellation  matrix  of  P,  denoted  Cp^  is 


Pl{xi,X2,X3, 

..,Xn) 

**  Fn+l(^l  7  ^2)  ^3?  • 

■ • 5  ®n) 

**  ?  ^2?  ^3?  • 

..,X„) 

Pl{xi,X2,X3, 

.■,Xn) 

**  Pn+l(^l 5  ^2?  ^3?  * 

• • j 

.  Plixi,X2,X3,. 

■  •  ‘1  ^n) 

•  •  •  Pn-j-l(^l9^2y^3i  • 

..,X„)  _ 

The  Dixon  polynomial  of  P^  denoted  bp  G  is 

c  _  I^fI 

and  the  Dixon  matrix  of  P,  denoted  Dp^  is  the  matrix  for  which  Sp  =  Vx{Sp)  DpVx{Sp)- 

If  the  Dixon  matrix  is  square,  its  determinant  is  a  projection  operator  whose  vanishing  is  a 
necessary  condition  for  the  system  P  to  have  a  common  solution  in  the  algebraic  torus  (C—  {0})’^. 
However,  if  the  Dixon  matrix  is  rectangular  then  the  determinant  (and  the  projection  operator) 
cannot  be  directly  computed.  Even  if  it  is  square,  it  may  be  singular,  resulting  in  a  trivial  projection 
operator,  which  is  undesirable.  The  following  theorem  given  in  [9],  provides  a  method  to  extract  a 
non-trivial  projection  operator  in  such  instances. 

Theorem  2.2  If  Dp  has  a  column  which  is  linearly  independent  of  all  other  columns,  then  the 
determinant  of  any  non-singular  rank  submatrix  of  Dp  is  a  non-trivial  projection  operator. 

This  method  of  computing  the  projection  operator  is  quite  efficient  in  practice  as  demonstrated  in 
[10].  As  can  be  noticed  from  the  theorem,  this  method  may  fail  if  there  is  no  column  in  the  Dixon 
matrix  which  is  linearly  independent  of  aU  other  columns.  However,  such  failure  seems  very  rare;  in 
fact,  it  has  never  occurred  on  the  numerous  problems  that  we  have  tried  from  different  application 
domains.  (For  systems  with  two  variables,  it  can  be  proved  that  this  method  always  works  [4].) 

Below  we  establish  a  relationship  between  the  Newton  polytopes  of  the  input  polynomials  and 
the  size  of  the  Dixon  matrix,  and  compare  it  with  resultant  matrices  in  other  formulations. 

3  Newton  Poly  topes  and  the  Dixon  Matrix 

Let  the  convex  hull  of  a  set  A  C  be  denoted  by  conv(A)  C  The  Newton  polytope  of  a 
polynomial  p  is  Af{p)  =  conv(^supp[p))  C  IR”',  where  as  defined  above,  supp{jp)  is  the  set  of  exponent 
vectors  of  aU  the  monomials  wrt.  X  in  p.  For  a  polynomial  p,  supp{p)  C  M{p)  H  7L^ . 

Given  a  set  of  points  .4  C  IR^  and  0  <  i  <  n,  7r,'(.4)  C  IR""  stands  for  the  set  obtained  by  replacing 
the  first  i  coordinates  of  all  points  in  ^  by  0  -  a  projection  of  ^  to  n  — i  dimensions.  We  will  abuse  the 
terminology  and  call  this  projection  as  the  projection  of  A.  7ro(4.)  =  A  and  '^^{A)  =  {0, . . . ,  0}. 
The  Minkowski  sum  of  sets  A,B  is  A  +  B  =  {a  +  b\  a  e  A,b  e  B]  CW' ,  or  equivalently, 

A  +  B=:[j{a-\-B).  (3) 

a^A 

Fact  3.1  Given  a  set  A  C  IR”  and  two  polynomials  p  and  q,  if  Af{p)  C  A  and  Af{q)  C  A,  then 
•V(p  +  q)QA. 
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Fact  3.2  Given  two  polynomials,  p  and  q,  N{pq)  =  J^ijp)  +  .A/"(g). 

For  simplicity,  it  is  assumed  from  now  onwards  that  the  input  system  is  unmixed,  i.e.,  the 
Newton  polytopes  of  aU  polynomials  are  equal.  For  such  a  set,  P,  let  N{P)  denote  the  Newton 
polytope  of  any  polynomial  in  P.  Following  lemma  relates  the  Newton  polytope  of  the  determinant 
of  cancellation  matrix  to  the  poly  topes  of  input  polynomials. 

Lemma  3.3  For  an  unmixed  set,  P  C  Q[^],  o/ n  +  1  polynomials,  J^{\Cp\)  C  ELo (•^('P))> 
where  Cp  is  the  cancellation  matrix  of  P. 

Proof:  The  Newton  polytope  of  aU  entries  in  row  i  +  1  of  the  cancellation  matrix  is  ti  {Af  (P)).  If 
\Cp\  is  expanded  as  a  sum  of  products,  then  each  summand  is  a  product  of  exactly  n-{-l  polynomials, 
one  from  each  row.  Form  Fact  3.2,  the  polytope  of  this  summand  is  ^"=0  (-^(-P))*  so  the  result 

follows  from  Fact  3.1. 

We  can  relate  the  Dixon  polynomial  to  the  input  polytopes  as  follows: 

Theorem  3.4  For  an  unmixed  set,  P  C  of  n  1  polynomials^, 

supp{Sp)  cf;xi(Af(P))na”. 

z=0 

Proof:  Let  q  =  n?=i(^»  ~  *«)•  Polynomial  q  contains  the  term  X\'  •  -Xn  which  is  a  constant  wrt.  the 
set  X  of  variables,  so  (0, 0, . . . ,  0)  €  {q)  •  Since  \Cp\  =  q  X  6p, 

J2niAf{P))  2  Afi\Cp\)  =  Afiqx8p)  =  M{q)+Mi6p)  (by  Lemma  3.3  and  Fact  3.2) 

=  (A/'(9)U{(0,0,...,0)})  +  Ar(^p)  =  (Ar(g)  +  A/'(M)UA/'(^p)  (by  Equation  3) 

D  Af{6p)  2  supp{6p).  ° 

The  number  of  columns  in  the  Dixon  matrix  equals  the  cardinality  of  support  of  the  Dixon 
polynomial.  The  cardinality  of  an  integer  point  set  is  asymptotically  bounded  by  the  volume  of  their 
convex  hull  [5].  Let  vol{A)  denote  the  n-dimensional  volume  of  a  convex  set  >1  C  M.”.  We  get 
the  following  bound  on  the  size  of  Dixon  matrix  as  a  function  of  the  Newton  polytopes  of  input 
polynomials. 

Corollary  3.5  The  number  of  columns  in  the  Dixon  matrix  of  an  unmixed  set,  P  C  Q[A'],  ofn  +  l 
polynomials  is 

o(^vol  (^TiiAfiP)) 

We  believe  this  is  the  first  time  that  such  a  bound  on  the  size  of  the  Dixon  matrix  has  been  derived 
in  terms  of  the  input  polytopes.  We  now  compare  this  to  similar  results  on  the  size  of  the  Macaulay 
and  sparse  resultant  matrices. 

slightly  stronger  version  can  be  proved,  viz.,  support  of  the  Dixon  polynomial  is,  in  fact,  the  number  of  integer 
lattice  points  inside  the  polytope  obtained  after  perturbing  (•^(^))  ^  negative  octant.  Note 

that  this  number  is  strictly  less  than  the  number  determined  by  Theorem  3.4  because  it  moves  some  of  the  boundary 
points  out  of  the  polytope.  But  this  decrease  does  not  change  the  analysis  and  bounds  in  the  rest  of  the  paper,  so  we 
adhere  to  the  theorem  above. 
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3.1  Comparison  with  Macaulay  Matrix 

If  the  total  degree  of  each  input  polynomials  is  d,  then  the  size  of  the  Macaulay  matrix  is  O  ^  )  • 

So  the  total  degree  completely  determines  the  size  of  Macaulay  matrix,  no  matter  how  many  terms 
are  missing  from  the  polynomials.  If  a  polynomial  system  has  the  same  number  of  variables,  but 
polynomials  of  higher  total  degree  than  another,  the  Macaulay  matrix  of  the  former  is  larger,  even 
though  their  Newton  polytopes  may  have  the  same  shape  and  volume. 

CoroUary  3.5  shows  that  for  the  Dixon  matrix,  this  is  not  the  case  at  aU.  In  fact,  the  size  of 
the  Dixon  matrix  has  no  dependence  on  the  total  degree  of  the  polynomials,  only  on  their  Newton 
polytopes.  Sizes  of  the  Dixon  matrices  of  two  different  polynomial  systems  are  the  same  as  long  as 
their  Newton  polytopes  have  the  same  shape,  no  matter  what  the  total  degrees  of  the  polynomials. 

The  Dixon  formulation  does  consider  the  sparsity  in  the  input  polynomials  and  implicitly  exploits 
the  structure  of  the  Newton  poly  topes  of  input  polynomials. 

3.2  Comparison  with  Sparse  Resultant  Matrix 

A  result  analogous  to  Corollary  3.5  exists  for  the  sparse  resultant  matrix  [7]: 

Fact  3.6  Size  of  the  sparse  resultant  matrix  of  an  unmixed  setj  P  C  of  n  +  1  polynomials  is 

O  (^vol 

The  reader  will  notice  the  surprising  similarity  between  Fact  3.6  and  Corollary  3.5.  While  the  sparse 
resultant  matrix  considers  integral  points  inside  the  Minkowski  sum  of  the  poly  topes,  the  Dixon 
matrix  considers  integral  points  inside  the  Minkowski  sum  of  successive  projections  of  poly  topes. 
Since  the  projections  are  of  successively  lower  dimensions  than  the  poly  topes  themselves,  the  Dixon 
matrix  is  smaller  than  the  sparse  resultant  matrix. 

As  an  example  of  how  these  various  polytopes  look,  see  Figure  1.  P  is  the  polytope  of  the  unmixed 
system  pi  =  ai  x^y-\-bi  x^y~\-Ci  x^y^  -^di  x^y*^  +  x^y^  for  1  <  i  ^  3.  The  number  of  integral  points  in 
D  and  S  are  the  sizes  of  the  Dixon  and  sparse  matrices,  respectively.  The  Macaulay  matrix  is  so  large 
that  the  region  corresponding  to  terms  in  its  columns  cannot  be  shown  in  this  figure  -  both,  the  sparse 
and  Dixon  polytopes,  are  subsets  of  this  region.  To  see  just  how  much  the  quantitative  difference 
in  sizes  of  the  three  resultant  matrices  is,  we  analyze  the  case  of  multi-homogeneous  polynomial 
systems,  which  forms  an  important  class  with  a  large  number  of  applications  such  as  robotics,  vision 
etc. 


3.3  Multi-Homogeneous  Systems 

Definition  3.7  A  set  of  polynomials  is  called  multi- homogeneous  of  type  (1^, . . . ,  lr;di,  . . . ,  dr) 
if  it  is  unmixed  and  the  set  of  variables  can  be  partitioned  into  r  subsets^  xi^. ,  .^Xr  such  thatj  for 
1  <  z  <  r,  the  number  of  variables  in  the  set  Xi  is  li  and  the  total  degree  of  each  polynomial  in  P  in 
the  variable  set  xi  is  di. 

In  the  rest  of  the  paper,  a  multi-homogeneous  system  should  be  understood  to  mean  a  multi- 
homogeneous  polynomial  set  of  type  (/i, . . . ,  /r;  di, . . . ,  with  n+1  polynomials,  where  n  =  Ya=i  U 
is  the  total  number  of  variables.  We  also  assume  that,  for  1  <  i  ^  t',  the  number  of  partition  variables 
/•  =  0{l),  and  the  partition  degree  di  =  (9(d),  and  consequently,  n  =  0{rl). 
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Figure  1:  Various  polytopes  for  the  system  P 

The  concept  of  an  n-fold  mixed  volume  provides  a  good  quantitative  basis  to  compare  different 
resultant  formulations.  Given  an  unmixed  set  P  C  Q[X],  its  n-fold  mixed  volume,  denoted 
mvol{P),  is  mvol{P)  =  n!  vol{M{P)).  Following  theorem  relates  the  size  of  the  Dixon  matrix  to  the 
n-fold  mixed  volume.  Its  proof  is  in  Appendix  A. 

Theorem  3.8  For  asymptotically  large  r,  the  size  of  the  Dixon  matrix  of  a  multi-homogeneous 
system  P  is  O  • 

Following  can  be  analogously  derived  for  the  sparse  resultant  matrix  from  Fact  3.6; 

Theorem  3.9  For  asymptotically  large  r,  size  of  the  sparse  resultant  matrix  of  a  multi-homogeneous 
system  P  is  O  (6"+^  mvol{P)). 

For  multi-homogeneous  systems,  for  asymptotically  large  r,  the  Dixon  matrix  is  smaller  than  the 
speirse  resultant  matrix  by  a  factor  exponential  in  n.  It  can  be  shown  that  the  sparse  resultant  matrix 
is,  in  turn,  smaller  than  the  Macaulay  matrix  by  a  factor  exponential  in  n. 

The  entries  in  the  Dixon  matrix  are  however  complicated  in  contrast  to  the  entries  in  the  sparse 
resultant  matrix  and  Macaulay  matrix,  for  which  the  entries  are  either  0  or  coefficients  of  the  mono¬ 
mials  in  the  polynomials.  In  the  next  section,  we  analyze  the  complexity  of  constructing  a  Dixon 
matrix. 

4  Construction  of  the  Dixon  Matrix 

Dixon  matrix  can  be  constructed  by  expanding  the  determinant  of  the  cancellation  matrix  and 
dividing  it  by  Since  the  entries  of  the  cancellation  matrix  are  multivariate  polynomials. 
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direct  expansion  can  be  quite  expensive,  and  give  rise  to  large  intermediate  expressions.  This  can  be 
avoided  by  using  dense  multivariate  polynomial  interpolation  [14]. 

Interpolation  works  in  stages.  In  each  stage,  polynomial  is  interpolated  in  one  more  variable  using 
bounds  on  the  number  of  terms  in  the  polynomial  (when  viewed  as  a  polynomial  in  the  variables 
which  have  already  been  lifted)  and  its  degree  in  the  variable  being  lifted.  Since  the  Dixon  polynomial 
has  2n  variables,  namely  X  and  X,  2n  stages  are  needed.  Using  analysis  in  Section  3  (Corollary  3.5), 
tight  bounds  on  the  number  of  terms  in  the  Dixon  polynomial,  and  its  degree  in  different  variables, 
can  be  derived  and  used  in  each  stage  of  interpolation.  The  complexity  of  our  variant  of  interpolation 
depends  on  the  volume  of  the  polytopes  since  that  is  the  quantity  which  determines  the  number  of 
terms  in  the  Dixon  polynomial,  and  hence,  the  cost  of  each  stage.  The  complexity  of  construction 
for  multi' homogeneous  systems  is  given  by  the  following  theorem  whose  proof  is  in  Appendix  B. 

Theorem  4.1  The  construction  of  the  Dixon  matrix  of  a  multi-homogeneous  system^  P,  requires 
O  ‘f^vol  (P)^j  arithmetic  operations. 

The  use  of  interpolation  to  construct  Dixon  matrices  is  possible  because  its  entries  are  coefficients 
of  a  polynomial,  which  can  be  interpolated  from  input  polynomials.  This  is  not  possible  for  sparse 
resultant  matrix  since  its  entries  do  not  have  such  a  relationship  with  the  input  polynomials. 

4.1  Total  Cost  of  Computing  the  Resultant  using  Dixon  Formulation 

Computing  the  resultant  using  Dixon  formulation  involves  two  steps:  (i)  constructing  the  Dixon 
matrix,  and  (ii)  computing  the  determinant  of  the  Dixon  matrix.  For  a  fixed  number  of  variables,  the 
size  of  the  Dixon  matrix  is  proportional  to  the  n-fold  mixed  volume,  thus,  computing  its  determinant 
takes  at  most  0{mvol{P)^)  arithmetic  operations.  From  the  above  theorem,  the  construction  cost, 
for  a  fixed  number  of  variables,  is  0{mvol(P)^),  Consequently,  for  a  fixed  number  of  variables,  the 
total  cost  of  computing  the  resultant  of  a  multi-homogeneous  system  using  the  Dixon  formulation  is 
0{mvol{Pf). 

4.2  Comparison  with  Sparse  Resultant  Formulation 

The  construction  cost  of  sparse  resultant  matrix  involves  two  measures  of  sparsity.  First  is  the  mixed 
volume,  which  we  have  used  throughout  this  paper,  and  the  other  is  the  number  of  terms  with  non¬ 
zero  coefficients  in  the  polynomials.  The  following  theorem  from  [7]  gives  the  cost  of  constructing 
the  sparse  resultant  matrix. 

Fact  4.2  If  the  number  of  monomials  with  non-zero  coefficients  in  each  polynomial  of  a  multi- 
homogeneous  system  P  is  m,  then  the  complexity  of  constructing  its  sparse  resultant  matrix  is 
O  {m  mvol  (P)  j . 

The  sparse  resultant  formulation  has  an  advantage  that  its  construction  cost  is  sensitive  to  both, 
the  mixed  volume,  as  well  as  the  cardinality  of  the  support  of  input  polynomials  (the  traditional 
measure  of  sparsity).  The  construction  of  sparse  resultant  matrix  can,  in  certain  cases,  take  less  time 
than  the  construction  of  the  Dixon  matrix.  However,  even  in  these  cases,  total  cost  of  computing 
the  resultant  using  sparse  resultant  formulation  is  more  than  Dixon  formulation  because  the  size  of 
sparse  resultant  matrix  is  unaffected  by  the  cardinality  of  support. 

We  consider  two  extreme  cases:  (i)  /u// polynomials,  in  which  aU  terms  inside  the  Newton  polytope 
have  non-zero  coefficients,  and  (ii)  hollow  polynomials,  in  which  only  the  terms  on  the  boundary  of 
the  poly  topes  have  non-zero  coefficients. 
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Ex 

No.  of 
Polys 

Dixon 

Macaulay 

Sparse 

GCP 

Time 

RSC 

Time 

Matrix 

Size 

BIBSlii 

1 

Geom.  Reason. 

3 

13 

0.18 

342 

55 

'  * 

4608 

34 

566 

2 

Formula  Deriv. 

6 

14 

0.36 

76 

330 

* 

86 

9.8 

4094 

3 

Formula  Deriv. 

3 

5 

0.08 

Xc 

15 

BflB 

* 

14 

0.2 

4 

Geometry 

3 

4 

0.08 

15 

HHflH 

14 

0.2 

* 

5 

Random 

4 

6 

0.06 

w 

84 

27 

0.7 

* 

6 

3 

10 

0.1 

0.58 

45 

■  iilM 

16 

0.2 

0.71 

7 

3 

18 

0.45 

47 

66 

Kua 

55 

2 

519 

8 

3 

18 

9.33 

msmi 

153 

* 

54 

1.9 

1369 

9 

Random 

4 

7 

0.21 

220  1 

56 

* 

24 

1.2 

10 

Random 

5 

36 

1.1 

1365 

^^HBI 

151 

40 

iEBII 

Implicitization 

3 

11 

0.06 

KSIHi 

36 

Km 

33 

0.7 

31.04 

IBB 

Equilibrium 

4 

5 

0.06 

KBB! 

20 

Km 

■BB 

20 

0.7 

2.36 

Vision 

6 

6 

3.3 

■XH 

792 

IHIIIIi 

60 

248  i 

10.53 

nail 

Comput.  Bio. 

3 

8 

0.27  j 

28 

KMliB 

0.68 

16 

0.33 

Table  1:  Empirical  Data 


Full  Systems:  In  a  full  system  P,  m  =  0{vol{P)).  Using  this,  the  complexity  of  constructing 
its  sparse  resultant  matrix  is  O  {mvol{P)f  -^y  For  a  fixed  number  of  variables,  since  the 

cost  of  computing  the  determinant  of  the  sparse  resultant  matrix  is  0{mvol{P)f  (from  Theorem 
3.9),  the  construction  cost  dominates  the  total  cost  of  computing  the  resultant.  Hence,  the  total  cost 
of  computing  the  resultant  of  a  full  multi- homogeneous  system  using  sparse  resultant  formulation 
is  0{rnvol{P)^'^).  However,  the  Dixon  formulation  (Section  4.1)  takes  0{mvol{P)^)  arithmetic 
operations,  which  is  better  than  sparse. 

Hollow  Systems;  In  this  case,  we  can  assume  that  vn  =  0{1)  in  Fact  4.2.  So  the  construction  cost 
of  the  sparse  resultant  matrix,  in  this  case,  is  proportional  to  the  n-fold  mixed  volume.  This  is  better 
than  the  construction  cost  of  the  Dixon  matrix,  and  if  only  a  resultant  matrix  is  desired,  the  sparse 
formulation  is  preferable.  However,  while  computing  the  resultant,  the  determinant  computation 
step  dominates.  Because  of  larger  size  of  the  sparse  resultant  matrix,  the  Dixon  formulation  again 
computes  the  resultant  faster. 

In  both  cases,  more  so  in  the  full  case,  the  Dixon  formulation  is  less  expensive  than  the  sparse 
resultant  formulation.  To  see  typical  timings  in  practice,  see  Table  4.2  (from  [10])  in  which  the 
timings  of  Dixon,  sparse  and  Macaulay  formulations  are  given  on  14  examples,  mostly  from  robotics, 
vision,  geometry  and  graphics.  The  coefficients  in  these  systems  are  polynomials  in  parameters  and 
not  just  integers.  For  sparse  and  Dixon  formulations,  the  matrix  construction  times  as  well  as  the 
determinant  computation  time  (called  RSC,  which  uses  interpolation  to  compute  the  determinant 
of  the  resultant  matrices)  are  separately  given.  We  wish  to  point  out  that  for  implement ational 
simplicity,  we  used  direct  matrix  expansion  of  the  cancellation  matrix  for  constructing  Dixon  matrix, 
and  this  has  worse  complexity  than  the  interpolation  method  discussed  in  this  paper.  For  Macaulay 
formulation,  we  give  two  timings,  one  using  the  Generalized  Characteristic  Method  of  Canny  [2], 
and  the  other  using  a  method  similar  to  one  in  Theorem  2.2.  As  can  be  observed,  Dixon  matrix  is 
much  smaller  and  the  Dixon  formulation  performs  much  better  than  others  and  was  able  to  solve  all 
14  problems  reasonably  well.  This  work  is  the  result  of  an  effort  to  seek  theoretical  justification  for 
these  empirical  observations. 


9 


References 

[1]  Bernshtein  D.N.,  The  Number  of  Roots  of  a  System  of  Equations,  FunktsionaVnyi  Analiz  i  Ego 
Prilozheniya,  9(3):l-4. 

[2]  Canny  J.,  Generalized  Characteristic  Polynomials,  Journal  of  Symbolic  Computation,  9:241-250. 

[3]  Canny  J.,  Pedersen  P.,  An  Algorithm  for  Newton  Resultant,  Technical  Report,  Cornell  Univer¬ 
sity,  CornellCS:TR93-1394,  1993. 

[4]  Dixon  A.L.,  The  eliminant  of  three  quantics  in  two  independent  variables.  Proc.  London  Math¬ 
ematical  Society,  6,  1908,  468-478. 

[5]  Ehrart  E.  Sur  un  probleme  de  geometrie  diophantienne,  I.  Polyedres  et  reseaux.,  J.  Reine  Angew. 
Math.,  226:1-29,  1967. 

[6]  Gelfand  I.M.,  Kapranov  M.M.,  Zelevinsky  A.V.,  Discriminants,  Resultants  and  Multidimen¬ 
sional  Determinants,  Birkhauser,  Boston,  1994. 

[7]  Emiris  I.,  Sparse  Elimination  and  Applications  in  Kinematics,  Doctoral  dissertation  thesis. 
Department  of  Computer  Science,  U  of  Calif.,  Berkeley,  1994. 

[8]  Hoffman  C.M.,  Algebraic  and  Numerical  Techniques  for  Offsets  and  Blends,  Computation  of 
Curves  and  Surfaces  W.  Dahmen  et.  al.  (eds.),  Kluwer  Acad.  Pub.,  1990,  499-528. 

[9]  Kapur  D.,  Saxena  T.,  Yang  L.,  Algebraic  and  Geometric  Reasoning  using  Dixon  Resultants, 
Proc.  ACM  ISSAC  94,  Oxford,  England,  July  1994. 

[10]  Kapur  D.,  Saxena  T.,  Comparison  of  Various  Multivariate  Resultants,  To  appear  in  Proc.  ACM 
ISSAC  95,  Montreal,  Canada,  July  1995. 

[11]  Sturmfels  B.,  Sparse  Elimination  Theory,  Proc.  Computat.  Algebraic  Geom.  and  Commut.  Al¬ 
gebra,  D.  Eisenbud  and  L.  Robbiano,  eds.,  Cortona  ,  Italy,  June  1991. 

[12]  Macaulay  F.S.,  The  Algebraic  Theory  of  Modular  Systems,  Cambridge  Tracts  in  Math,  and 
Math.  Phys.,  19,  1916. 

[13]  Weyman  J.  and  Zelevinsky  A.,  Determinantal  Formulas  for  Multigraded  Resultants,  Journal  of 
Algebraic  Geometry,  pp  569-597,  1994. 

[14]  Zippel  R.,  Effective  Polynomial  Computation,  Kluwer  Academic  Publishers,  Boston,  1993. 


10 


A  Proof  of  Theorem  3.8 


Let  P  be  a  multi-homogeneous  polynomial  set  of  type  {li, . .  di, . . dr)  with  n -|- 1  polynomials, 
where  n  =  h  is  the  total  number  of  variables.  Then, 


vol{Af{P))  = 
mvol  (P)  = 

Size  of  Dixon  matrix  = 


T  Ji 

nfr. 


n!  vol{M{P)) 

^  n-\-\ 


O 


voi{j\r{p)) 


Assuming  k  =  0{l),  using  the  Stirling  approximation  for  asymptotically  large  r ,  the  proof  of  Theorem 
3.8  follows: 


B  Proof  of  Theorem  4.1 

The  polytopes  of  all  multi-homogeneous  systems  with  n  variables  with  partition  degree  d  are  subsets 
of  the  polytope  of  the  system  Since  the  size  of  the  Dixon  matrix  depends  only  on 

the  input  polytopes,  the  Dixon  matrix  is  the  largest  for  (1, . . . ,  1;  d, . . . ,  d).  Consequently,  the  cost  of 
construction  the  Dixon  matrix  for  such  a  system  bounds  the  cost  of  constructing  the  Dixon  matrix 
for  all  other  systems  with  n  variables  and  partition  degree  d. 

Let  the  n  variables  of  the  system  (1, . . . ,  1;  d, . . . ,  d)  be  a;i , . . . ,  and  the  corresponding  substi¬ 
tution  variables  be  xi,  •  •  •,x„.  For  1  i  fi,  one  can  easily  derive  from  the  analysis  in  section  3 
that  for  the  system  (1, . . .,  1;  d, . .  .,d): 

1.  The  degree  of  Spin  Xi  is  id  -  1,  and  hence,  degree  in  Xn-i+i  is  also  id  -  1. 

2.  The  number  of  terms  in  the  variables  xi, . . . ,  x,-  (regarding  all  other  variables  as  constants)  are 
bounded  by  i\d\ 

We  use  dense  multivariate  interpolation  to  compute  the  Dixon  polynomial  (which  will  directly 
give  us  the  Dixon  matrix).  Given  J9  -|-  1  values  of  a  univariate  polynomial,  interpolating  them  to 
produce  the  polynomial  takes  I{D)  =  (^(P^)  time  [14].  Let  the  cost  of  computing  a  single  value 
of  the  univariate  polynomial  from  the  cancellation  matrix  be  B.  Computing  a  value  requires  (i) 
substituting  values  for  each  variable  in  each  entry  of  the  cancellation  matrix  which  takes  0(n^d  ) 
arithmetic  operations  and  (ii)  computing  its  determinant,  which  takes  0{^n^)  operations.  Hence, 
B  =  0{n^d^). 

Multivariate  interpolation  works  in  stages;  in  each  stage  one  variable  is  lifted.  Since  the  Dixon 
polynomial  has  2n  variables,  namely  A  and  A,  we  need  2n  stages  to  interpolate  it.  We  lift  the 
variables  in  the  order  xi,x„,X2,x„_i, . .  .,Xn,xi.  We  call  the  variable  in  this  sequence,  yk,  eg. 
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X 


\ 


j/3  =  X2  and  ?/4  =  Xn-i.  Let  Nk  be  the  complexity  of  the  first  k  stages.  Now,  the  stage  involves 
(i)  interpolating  the  polynomial  in  A;  -  1  variables  as  many  times  as  the  deg{6p,  yk)  +  1  and  (ii) 
interpolating  as  many  univariate  polynomials  (in  yk)  as  the  number  of  terms  in  6p  in  variables 
y\i  •  ■  ■  iVk-i-  Since,  in  the  first  stage,  one  needs  to  compute  the  values  of  the  Dixon  polynomial 
directly  by  computing  the  determinant  of  the  Dixon  matrix,  we  get  the  following  recurrence  relations 
for  our  2n  stages: 


Ni  =  d5  +  /(d), 

Nk  =  k^dNk-i^{kf\d’^f)  [{kc-iy.dl^'^-^)  I{kcd), 
where,  kc  =  fA:/2]  and  kj  =  LA;/2j.  The  solution  to  this  recurrence  is 

JV,„  =  j  =  o  j  =  O  , 

therefore. 


and  the  theorem  is  proved 


N 


2n 


mvol(P)^ 


y^\3fpn  jjl 
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Abstract 

Dixon’s  method  for  computing  multivariate  resultants  by  si¬ 
multaneously  eliminating  many  variables  is  reviewed.  The 
method  is  found  to  be  quite  restrictive  because  often  the 
Dixon  matrix  is  singular,  and  the  Dixon  resultant  vanishes 
identically  yielding  no  information  about  solutions  for  many 
algebraic  and  geometry  problems.  We  extend  Dixon’s  me¬ 
thod  for  the  case  when  the  Dixon  matrix  is  singular,  but 
satisfies  a  condition.  An  efficient  algorithm  is  developed 
based  on  the  proposed  extension  for  extracting  conditions 
for  the  existence  of  afline  solutions  of  a  finite  set  of  polynomi¬ 
als.  Using  this  algorithm,  numerous  geometric  and  algebraic 
identities  are  derived  for  examples  which  appear  intractable 
with  other  techniques  of  triangulation  such  as  the  succes¬ 
sive  resultant  method,  the  Grobner  basis  method,  Macaulay 
resultants  and  Characteristic  set  method.  Experimental  re¬ 
sults  suggest  that  the  resultant  of  a  set  of  polynomials  which 
are  symmetric  in  the  variables  is  relatively  easier  to  compute 
using  the  extended  Dixon’s  method. 

1  Introduction 

There  exist  many  different  techniques  for  solving  a  system 
of  algebraic  polynomial  equations.  Resultant  computations 
are  still  perhaps  the  most  popular  way  to  get  information 
about  solutions  of  polynomial  equations.  Sylvester  resultant 
method  is  the  most  studied  and  used  technique  for  deter¬ 
mining  a  common  solution  of  two  polynomial  equations  in 
one  variable.  Implementations  of  Sylvester  resultants  are 
supported  in  all  computer  algebra  systems  including  Math- 
ematica,  Maple,  Reduce  and  Macsyma. 

Successive  Sylvester  resultant  computations  can  be  used 
to  solve  a  system  of  polynomial  equations  in  many  variables 
by  eliminating  variables  one  at  a  time.  However,  the  perfor¬ 
mance  of  successive  resultant  techniques  is  very  sensitive  to 
the  ordering  of  variables.  Human  intervention  is  required  to 
determine  the  most  efficient  ordering  and  so  they  are  not  au¬ 
tomatic  methods.  Successive  elimination  methods  are  also 
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very  inefficient,  in  fact,  it  is  more  efficient  to  directly  com¬ 
pute  a  condition  for  the  existence  of  common  zeros  without 
eliminating  variables  one  at  a  time.  Macaulay’s  multivari¬ 
ate  resultant  method  is  one  such  method,  and  it  has  recently 
gotten  popularized  [3].  Unfortunately,  efficient  implementa¬ 
tions  of  Macaulay’s  method  are  not  available  in  any  com¬ 
puter  algebra  method.  Other  alternatives  such  as  Grobner 
basis  [2]  and  Characteristic  set  methods  [14]  [6]  don’t  seem 
to  work  well  either. 

Dixon’s  method  is  an  efficient  method  to  simultaneously 
eliminate  variables  from  a  system  of  nonhomogeneous  poly¬ 
nomial  equations,  but  unfortunately,  it  does  not  work  for 
most  algebraic  and  geometric  problems.  The  main  result 
reported  in  Dixon’s  original  paper  [7]  was  a  method  to  ob¬ 
tain  the  resultant  of  a  set  of  three  generic  bidegree  polyno¬ 
mials.  This  is  a  generalization  of  Cayley’s  formulation  [4] 
of  Bezout’s  efficient  method  for  computing  the  resultant  of 
two  univariate  polynomials.  Dixon  wrote  that  his  method 
generalized  to  any  n  -f  1  generic  ndegree^  polynomials  in  n 
variables.  That  is,  his  method  gives  the  resultant  of  n  -j-  1 
generic  ndegree  polynomials.  For  arbitrary  set  of  n  +  1  non¬ 
homogeneous  polynomials  in  n  variables  (i.e.  not  necessarily 
generic  and  ndegree),  his  method  gives  a  necessary  condition 
(henceforth  called  the  Dixon  resultant)  for  the  existence  of 
a  common  affine  zero. 

For  most  problems  which  arise  in  geometry,  the  matrix 
set  up  in  Dixon’s  method,  the  Dixon  matrix,  becomes  sin¬ 
gular.  As  a  consequence,  the  Dixon  resultant  vanishes  iden¬ 
tically,  without  providing  any  information  about  the  com¬ 
mon  solutions  of  equations.  This  is  perhaps  the  reason  that 
Dixon’s  method  has  not  been  widely  used,  even  though  it  is 
quite  efficient.  Chionh  in  [5]  suggested  using  perturbation 
of  certain  coefficients  to  obtain  nonzero  conditions,  but  this 
is  a  nonautomatic  method  and  requires  human  expertise. 
Canny  in  [3]  defined  the  Generalized  Characteristic  Poly¬ 
nomial  for  Macaulay  resultants  which  is  a  systematic  way 
of  perturbing  a  polynomial  system  so  that  one  gets  nonzero 
conditions  for  the  existence  of  affine  solutions.  This  can  also 
be  achieved  for  Dixon  resultants  by  mechanically  perturb¬ 
ing  the  polynomials  to  be  ndegree,  but  this  leads  to  a  larger 
size  Dixon  matrix  with  larger  entries  too.  Computing  the 
determinant  of  such  a  matrix  is  cumbersome  and  leads  to 

+  1  nonhomogeneous  polynomials  pi,  •  •  •  ,Pn-|-l  ^1,’  ’ ' 

are  said  to  be  generic  ndegree  if  there  exist  nonnegative  integers 
mi ,  • '  • ,  mn  such  that  each 

Pj  =  •*  for  a//  1  <  j  <  n  +  1. 


inefficiency. 

In  this  paper,  we  overcome  these  restrictions  by  extract¬ 
ing  nonzero  conditions  directly  from  the  Dixon  matrix  of 
the  system  of  polynomials.  The  proposed  method  does  not 
involve  any  perturbation,  instead  we  identify  and  prove  a 
condition  on  singular  Dixon  matrices  under  which,  nonzero 
necessary  conditions  for  the  existence  of  a  common  solu¬ 
tion  for  a  system  of  equations  can  be  extracted.  Based  on 
this  result,  an  algorithm  for  computing  the  nonzero  neces¬ 
sary  conditions  for  the  existence  of  affine  common  zeros  of 
a  system  of  polynomials  directly  from  its  Dixon  matrix  is 
developed.  It  is  found  that  most  of  the  problems  for  which 
Dixon’s  original  method  was  inapplicable,  can  be  solved  us¬ 
ing  our  extension  of  it.  Moreover,  unlike  perturbation  tech¬ 
niques  [3],  the  proposed  method  does  not  introduce  any  new 
variable  and  terms  into  the  system  of  polynomials.  Because 
of  this,  computation  is  not  made  any  more  difficult.  Finally, 
unlike  successive  elimination  techniques  and  perturbation 
techniques  of  [5],  our  method  is  fully  automatic  and  does 
not  require  any  human  intervention. 

We  successfully  proved  many  nontrivial  algebraic  and  ge¬ 
ometric  identities  and  theorems  within  a  few  minutes  (the 
maximum  time  for  any  example  was  less  than  9  minutes) 
using  a  naive  implementation  of  our  algorithm  in  MAPLE, 
We  mention  five  such  examples  in  this  paper.  Two  of  these 
five  examples  seem  intractable  by  implementations  of  most 
other  techniques  of  elimination.  The  other  three  examples 
also  take  too  long  to  compute  using  other  techniques.  Im¬ 
plementations  of  both,  Sylvester’s  resultant  and  Bezout’s 
resultant  in  MAPLE  were  tried  to  perform  successive  resul¬ 
tant  computations,  but  they  ran  up  to  a  day  before  running 
out  of  memory  on  all  examples.  We  also  tried  to  compute 
the  lexicographic  Grobner  basis  on  MACAULAY  (Bayer  and 
Stillman  [1])  as  well  as  MAPLE,  but  they  too  ran  out  of 
memory  after  running  upto  a  day.  Under  block  orderings, 
MACAULAY  could  not  compute  the  resultant  of  two  ex¬ 
amples.  For  the  other  three  that  it  could  solve,  it  took 
substantially  longer  time  than  our  algorithm.  Macaulay  re¬ 
sultants  (GCP  if  the  Macaulay  matrix  is  singular  [3])  were 
also  implemented  and  tried  on  MAPLE,  but  they  failed  for 
three  examples  and  took  longer  time  on  the  other  two.  Fi¬ 
nally,  perturbation  techniques  similar  to  the  GCP  of  Canny 
were  tried  to  obtain  nonsingular  Dixon  matrices,  but  per¬ 
turbation  of  polynomials  resulted  in  a  blowup  of  the  order  of 
the  Dixon  matrix  due  to  additional  terms  and  consequently, 
the  determinant  computation  could  not  be  successfully  per¬ 
formed  on  MAPLE.  These  preliminary  findings,  though  far 
from  complete,  are  very  encouraging  and  suggest  that  fur¬ 
ther  studies  are  needed  to  examine  methods  based  on  Dixon 
resultants. 

In  the  next  section,  we  review  Cayley’s  formulation  of  Be¬ 
zout’s  method  for  two  univariate  polynomials,  and  Dixon’s 
method  for  any  n  +  1,  generic  ndegree  polynomials  in  n 
variables.  Section  2.3  outlines  the  shortcomings  of  Dixon’s 
method  and  the  reason  for  its  unapplicability  to  most  ge¬ 
ometric  problems.  In  section  3,  an  algorithm  is  developed 
which  overcomes  this  restriction  and  extracts  a  nonzero  nec¬ 
essary  condition  for  the  existence  of  affine  zeros  from  the 
Dixon  matrix,  even  when  it  is  singular.  In  section  4,  two 
detailed  examples  are  presented  to  iQustrate  the  application 
of  our  method.  In  section  5,  the  advantages  of  this  method 
over  other  techniques  are  discussed,  and  finally,  in  section  6, 
some  empirical  results  are  reported. 


2  Review  of  Dixon  Resultants 

In  the  next  subsection,  we  describe  Cayley’s  method  to  de¬ 
termine  the  resultant  for  two  univariate  polynomials.  In  the 
subsection  following  that,  we  describe  an  extension  by  Dixon 
which  gives  a  general  method  to  determine  resultant  of  n-fl 
generic  ndegree  polynomials  in  n  variables.  A  detailed  ex¬ 
position  of  Dixon’s  method  can  be  found  in  [12]. 


2.1  Cayley's  Method  for  Two  Polynomials  in  One 
Variable 

Even  though  the  analysis  in  this  section  was  developed  by 
Cayley  in  [4],  we  will  mostly  use  Dixon’s  name  while  devel¬ 
oping  the  notation  so  that  a  uniform  notation  can  be  carried 
to  the  generalization  that  is  presented  in  the  next  subsec¬ 
tion. 

Consider  a  set  T  of  two  univariate  polynomials  {pi(a:i), 
P2{xi)}.  Let  dmax  =  max{degree(pi ,  xi)y  degree{p2 ,  xi)), 
where  by  degree{pijXi)f  we  mean  the  maximum  degree  of 
xi  in  Pi .  Consider  the  polynomial 


A(Xi,Qfi) 


Pi(a:i)  P2{xi) 

Pl{0il)  P2{(Xl)  ’ 


where  qi  is  a  new  variable  and  pi(oi)  stands  for  uniformly 
replacing  xi  by  oi  in  pi.  Making  xi  =  ai  would  make 
A  =  0  which  means  that  Xi  —  ori  divides  A.  Let, 


6{xi,(xi) 


A(a;i,Qfi)  _  pi{xi)p2(ai)  -  P2{xi)pi(ai) 

(xi  -  Qfi)  (xi  -  ai) 


We  call  6  the  Dixon  Polynomial.  The  polynomial  ^  is  a 
dmax  —  1  degree  polynomial  in  ofi  and  is  symmetric  in  xi 
and  Qfi.  Every  common  zero  of  pi(a:i)  and  p2{xi)  is  a  zero 
of  5(a;i,ai)  no  matter  what  value  cvi  has;  thus  at  a  common 
zero  of  pi  and  p2 ,  the  coefficient  of  every  power  product  of 
ai  in  5(xi,ai)  must  be  0.  This  gives  a  set  (say  £')  of  dmax 
equations  corresponding  to  the  coefficients  of  all  the  power 
products  of  ai  (viz.  aj  for  each  0  <  i  <  dmax  —  1),  each  of 
which  is  a  dmax  —  1  degree  polynomial  in  zi.  So,  if  D  is  the 
dmax  X  dmax  Coefficient  matrix  of  ,  then 


1  N 

(l\ 

Xi 

0 

III 

= 

0 

1  ) 

0  / 

If  each  power  product  of  xi  (including  x?  =  1)  is  viewed  as 
a  new  variable,  Vt,  then  we  get  a  set  £  of  dmax  homogeneous 
linear  equations  in  dmax  variables: 


/  vi  \ 

(l\ 

1 

V2 

0 

III 

V3 

= 

0 

1 

V  / 

1  0  / 

If  a  common  affine  zero  exists  for  T  (say  xi  =  ci),  then 
this  is  a  solution  for  £^  also.  This  results  in  a  nontrivial 
solution  for  £  viz.,  Vi  =  1,V2  =  ci,V3  =  cj,  •  •  • ,  = 

gdmax-i  Hence  if  T  has  a  common  affine  zero,  then  £  has 
a  nontrivial  solution,  implying  that  the  determinant  of  D  is 
zero.  This  gives  a  necessary  condition  on  the  coefficients  of 
Pi  and  p2  for  them  to  have  a  common  zero. 


It  was  proved  by  Cayley  in  [4]  that  vanishing  of  the  de¬ 
terminant  of  Df  the  Dixon  resultant  of  T,  is  a  necessary 
and  sufficient  condition  for  to  have  a  nontrivial  common 
projective  zero.  In  fact,  Bezout  matrix  coincides  with  the 
Dixon  matrix  for  the  univariate  case. 

2.2  Dixon's  Generalization  to  Two  and  More  Vari¬ 
ables 

Dixon  explicitly  generalized  Cayley’s  method  presented  in 
the  previous  subsection  to  the  two  variable  case  in  [7],  but 
it  can  be  easily  generalized  to  any  number  of  variables  (and 
Dixon  alluded  to  this),  so  we  present  this  generalization  un¬ 
der  Dixon’s  name  too. 

Let  T  =  ,Xn)}  be  the 

set  of  n-f  1  generic  ndegree  polynomials  in  n  variables.  Let, 

dmaxi  =  7nax{degree{pi ,  Xi),  •  •  • ,  degree{pn+i , 
for  all  1  <  i  <  n. 

We  form  an  (w  -|- 1)  x  (n  -|- 1)  determinant  similar  to  the 
determinant  in  the  last  section.  Let  this  determinant  be 
defined  eis  follows: 


^(^/l )  *  *  *  i  Xfij  ^  (Xfi^  — 


Pl{xi,X2,  ■ 

•••  Pn+i(a:i,a^2,  •  • 

•.a^n) 

Piiai,X2,- 

*  *  >  ®n) 

Pn+l(ai,  3^2,  •  • 

•  ,  aJn) 

Pi(ai,a2,  • 

*  *  >  ®n) 

•••  Pn+l(Qfl,0'2,  • 

•  ,  a^n) 

Pi(Q’i,a2,- 

’  *  j  ^n) 

•••  Pn+l(Qfl,  Of2,  •  • 

•  ,  Un) 

where  ai,  are  new  variables  and  pi(ai,  •  •  • , 

•  •  • ,  Xn)  stands  for  uniformly  replacing  Xj  by  ajfoil  <  j  <k 
in  Pi. 

Each  of  Xi  =  a*,  for  all  1  <  «  <  n,  is  a  zero  of  A,  so  they 
can  be  removed  by  dividing  A  by  njLi(xt  —  cvi).  Let 


^{^Xx )  '  '  *  j  Xrii  CXij  ***>0^2) 


A(xi,»-»,Xn,«l,--,Q?n) 

(Xl  —  ai)  •  *  •  (Xn  -  (Xn) 


The  polynomial  5,  which  is  the  Dixon  polynomial,  is  of 
degree  ((n -f  1  —  2)  x  dmaxi)  —  1  in  Ofi  and  {i  x  dmaxi)  —  1  in 
Xi  for  all  1  <  2  <  n. 

Any  common  zero  of  T  (say  xi  =  ci ,  •  •  • ,  Xn  =  Cn)  makes 
the  Dixon  polynomial  vanish,  no  matter  what  the  values  of 
hence  all  the  coefficients  of  the  various  power 
products  of  ai,  •  •  • ,  On  in  the  Dixon  polynomial  vanish.  Let 
S'  be  the  set  of  all  the  polynomials  in  xi,  •  ♦  • ,  Xn  which  are 
coefficients  of  the  power  products  of  ai,  ♦  •  • ,  an  in  6.  This 
set  then  has  exactly 


nj^l((n.  +  1  —  *)  X  dmaxi^  —  72!  X  n,— 1  —  S 

equations  (one  for  each  power  product  of  ai,  •  •  • ,  an),  each 
of  which  is  of  degree  (  2  X  drnaxi'^  1  in  x*.  Also,  there  are 

IIi~]^2  X  dffiaxi  “  X  Hj— jdmaxj  ~  5 

power  products  in  xi ,  •  •  • ,  Xn  in  the  equations  of  S'.  Let  D 


be  the  s  x  s  coefficient  matrix  of  S' .  Then 

i  z  \ 


£'  =  D 


X1X2 

X2 
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(°] 
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\  0  / 


If  each  power  product  of  xi ,  •  •  • ,  Xn  (including  x?  •  •  ♦  x°  =  1) 
is  viewed  as  a  new  variable,  Vi,  then  we  get  a  set  ^  of  5 
homogeneous  linear  equations  in  s  variables: 


/  VI  \ 
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V2 

0 

III 

V3 
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0 

1 

V,  J 

1 0  7 

If  a  common  affine  zero  exists  for  (say  xj  =ci,---,Xn  = 
Cn),  then  this  is  a  solution  for  S'  also.  This  results  in  a  non¬ 
trivial  solution  for  S  viz.,  v\  =  1,  U2  =  ci,  V3  =  C2,  •  •  • ,  = 

^dmaxx  i^2dm«x2  1  ^  ^nd^axn  “1  jjence,  as  before,  if  T  has 
a  common  affine  zero,  then  S  has  a  nontrivial  solution,  im¬ 
plying  that  the  determinant  of  D  is  zero.  This  gives  a  nec¬ 
essary  condition  on  the  coefficients  of  pi,  -  •  ,Pn+i  for  them 
to  have  a  common  zero.  As  before,  D  the  Dixon  matrix 
and  its  determinant  the  Dixon  resultant. 

It  wcLs  proved  by  Dixon  in  [7]  that  for  n  -|-  1  generic 
ndegree  polynomials  in  n  variables,  vanishing  of  the  Dixon 
resultant  is  a  necessary  and  sufficient  condition  for  them 
to  have  a  nontrivial  projective  zero,  or  a  necessary  condi¬ 
tion  for  the  existence  of  an  affine  zero.  Moreover,  dei{D) 
is  not  identically  zero.  Recall  that  generic  ndegree  means 
that  there  exists  a  set  of  n  integers,  A;i ,  •  •  • ,  A:„  such  that  all 
polynomials,  pi ,  *  •  * ,  Pn+i ,  are  of  the  kind: 

Pj  “  ^ti  =  l  ’  ’  ’  ’  1  <  y  <  71  H- 1. 

where  each  a  is  a  distinct  indeterminate.  So  for  such  a  set 
of  polynomials,  the  Dixon  resultant  is  a  polynomial  in  all  of 
a’s  that  is  not  identically  zero  but  vanishes  for  those  par¬ 
ticular  values  of  a’s  (from  the  algebraic  closure  of  the  field 
of  rational  numbers,  say)  for  which  .F  has  a  common  affine 
zero.  Next  we  discuss  the  case  when  the  polynomials  are 
not  generic  and  ndegree. 


2.3  Case  of  Arbitrary  Polynomials 

Consider  the  following  three  polynomials  in  two  variables. 

Pi  =  x^  -h  axy  -y-\-b 
P2  =  xp  -f  (tt  +  h)y  -  c 
P3  =  +  p  -f  2ac 


Here  a,  b  and  c  are  parameters.  This  set  of  polynomials  is 
not  generic  (because  for  example,  the  coefficient  of  x^  in  pi , 


1,  is  not  an  independent  parameter,  and  the  coefficient  of  y 
in  p2  is  not  independent  since  it  depends  on  the  coefficients 
of  pi).  This  set  is  also  not  ndegree  (because  for  example, 
even  though  the  maximum  degree  of  x  and  y  is  two  and  one 
respectively,  there  is  no  term  x^yin  any  of  the  polynomials. 
So  what  must  one  do  in  this  case?  The  analysis  in  the  previ¬ 
ous  section  was  developed  for  generic  ndegree  polynomials, 
so  it  would  not  work  directly  for  this  set  of  polynomials.  It 
is  possible  to  write  down  the  generic  polynomials  of  ndegree 
(2,1)  in  X  and  y: 

p[  =  aix^y-{-a2X^ ’]-asxy-\-aiX’{-asy-\-a6 

P2  =  ayx^y asx^  ^  agxy aiox any ai2 

P3  z=  aisx^y  ai^x^ aisxy -i- aiex any ais 

Now  construct  its  4  x  4  Dixon  matrix  D,  Compute  its  de¬ 
terminant  det{D)  and  substitute  the  values  ai  =0,^2  = 
1, 03  =  a,  a4  =  0,  05  =  — 1,  •  •  • ,  018  =  2oc  into  det{D),  This 
would  give  a  polynomial  in  a^b  and  c  (call  it  pdix)-  The 
polynomial  pdix  vanishes  at  all  those  values  of  o,  6  and  c  at 
which  pifP2  and  ps  has  an  affine  zero.  This  same  approach 
could  be  followed  for  any  arbitrary  set  of  polynomials  T  as 
follows: 

1.  Construct  generic  ndegree  polynomials  which  when 
specialized  in  a  certain  way  (say  a  mapping  ^  from  the 
parameters  of  to  the  parameters  of  /*),  becomes  T , 

2.  Find  the  Dixon  resultant  of  T* , 

3.  Specialize  the  parameters  in  this  Dixon  resultant  poly¬ 
nomial  using  tp  to  get  the  polynomial  pdix- 

When  this  approach  is  followed,  often  pdix  becomes  iden¬ 
tically  zero.  This  is  something  which  was  avoided  while 
working  with  generic  ndegree  polynomials.  More  impor¬ 
tantly,  this  approach  also  results  in  inefficiency  because 
has  larger  polynomials  (substantially  in  some  cases)  than  T. 

We  have  adopted  a  different  approach  in  this  paper.  Re¬ 
move  the  restriction  of  generic  ndegree  from  the  construction 
of  the  previous  section.  So  now,  we  are  given  a  set  of  any 
arbitrary  polynomials.  Construct  A  as  in  the  last  subsec¬ 
tion.  Divide  its  determinant  by  (xi  —  ai)  •  •  •  {xn  —  (Xn)  to  get 
the  Dixon  polynomial  6.  This  polynomial  now  may  have  a 
degree  less  than  or  equal  to  ((n  -|-  1  ^  i)  x  dmaxi )  —  1  in  each 
a*  and  less  than  or  equal  to  (i  x  dmaxi)  —  1  in  each  Xi  for  all 
1  <  2  <  n.  Construct  the  set  of  linear  equations  £  from  the 
Dixon  polynomial  as  before,  except  that  now  £  may  have 
less  than  or  equal  to  s  equations  in  less  than  or  equal  to  s 
variables.  We  continue  to  call  the  coefficient  matrix  of  £  to 
be  the  Dixon  matrix  D  which  may  he  an  si  x  S2  matrix 
where  3\  <  s  and  S2  <  s. 

Notice  though  that  the  determinant  of  the  Dixon  ma¬ 
trix  may  not  give  us  a  necessary  condition  for  the  existence 
of  affine  zeros  of  T  because  £  may  not  have  a  nontrivial 
solution  even  when  T  has  a  common  affine  zero.  This  is 
because  the  Dixon  matrix  may  not  contain  the  column  cor¬ 
responding  to  the  monomial  a:?  ^-^d  hence  if  T  has 

only  x\  =  0,*'*,a!n  =  0  as  the  common  solution,  then  £ 
only  has  a  trivial  solution.  Even  if  the  Dixon  matrix  does 
contain  the  column  corresponding  to  x?  -  '-a;®,  it  could  be 
singular.  Worst  of  all,  the  Dixon  matrix  could  be  a  rectan¬ 
gular  matrix  (because  si  and  S2  may  not  be  same),  in  which 
case,  there  does  not  even  exist  the  possibility  of  comput¬ 
ing  the  determinant.  We  deal  with  all  of  these  possibilities 
together  in  the  next  section. 


3  Dealing  with  Singular  Dixon  Matrices 

First  let  us  formalize  the  notion  for  this  new  framework  of  ar¬ 
bitrary  polynomials.  Let  there  be  m  parameters  ai ,  *  •  • ,  Um 
and  V  =  Q[ai ,  •  *  ♦ ,  am],  the  ring  of  aU  polynomials  in  them 
(Q  is  the  field  of  rational  numbers).  The  problem  being 
solved  is,  given  a  set  .F  of  n  -h  1  polynomials  from  V[xi^  •  •  • , 
Xn],  give  a  polynomial  from  V  that  is  not  identically  zero, 
but  which  must  vanish  for  all  those  particular  values  of  the 
parameters  ai,  •  •  • ,  Um  from  (6  is  the  algebraic  closure 
of  Q)  for  which  T  has  a  common  affine  zero  in  Q” .  We  will 
call  any  such  polynomial  a  Projection  operator,  q. 

Consider  a  slightly  different  problem  from  the  one  posed 
in  the  previous  paragraph  as  follows:  A  set  of  constraints  C 
on  the  variables  xi,  •  •  • ,  Xn  of  the  form  Xj^  ^  OA  -  •  •  Ax^j^  ^0 
are  also  given,  and  we  are  looking  for  a  polynomial  in  V  that 
is  not  identically  zero,  but  vanishes  on  all  those  particular 
values  of  parameters  from  for  which  T  has  a  common 
affine  zero  in  Q”  which  also  satisfies  the  set  of  constraints 
C.  We  win  call  this  the  Projection  operator  modulo  C, 
Qc- 

If  one  is  looking  for  a  necessary  condition,  q  for  the  exis¬ 
tence  of  affine  solutions,  then  C  can  be  and  Q  ^  Q4,.  But 
the  problem  of  computing  qc  in  itself  is  important  because 
in  areas  such  as  geometric  theorem  proving,  one  occasionally 
encounters  constraints  under  which  solutions  are  sought.  So, 
we  will  now  give  a  method  which,  given  the  set  of  n-|-l  poly¬ 
nomials  T  C  ,  •  •  • ,  Xn]  and  the  set  of  constraints  C  on 
the  variables,  gives  the  required  necessary  condition  on  the 
parameters  which  is  not  identically  zero  provided  the  Dixon 
matrix  satisfies  a  certain  condition. 

Recall  that  the  Dixon  matrix  is  of  dimension  Si  x  S2 
where  si  <  n\Wi-idmaxi  rows  (corresponding  to  the  coeffi¬ 
cients  of  all  power  products  of  as  in  the  Dixon  polynomial) 
and  S2  <  n\Wi-idmaxi  columns  (corresponding  to  all  power 
products  of  XiS  in  the  Dixon  polynomial).  Let  the  columns 
of  the  Dixon  matrix  be  denoted  by  m*  (mi  being  the  first 
column  of  the  Dixon  matrix  and  so  on).  For  any  column 
indexed  by  mi,  let  monom{mi)  denote  the  monomial  (or 
power  product  in  xi,  •  •  • ,  Xn)  corresponding  to  that  column. 
Also,  given  a  set  of  constraints  C,  let  nvcol{C)  denote  the 
set  of  all  columns  mi  such  that  C  monom(mi)  ^  0. 

First  let  us  establish  a  lemma  about  polynomials  in  which 
all  coefficients  are  from  the  algebraic  closure  of  the  field 
of  rational  numbers  Q.  Given  a  set  of  n  -|-  1  polynomials 
Q  C  2[xi,  •  • ' ,  Xn],  let  N  be  the  si  x  S2  Dixon  matrix  of  Q. 
Also  let, 

J\fi  =  {X\X  is  an  si  X  (s2  —  1)  submairix  of  N  obtained 
by  deleting  a  column  which  belongs  to  nvcol{C)  from 
N} 

Notice  that  if  C  is  Xi  ^  0  A  •  •  •  A  Xn  ^  0  then  Afi  contains  all 
the  si  X  (s2  —  1)  submatrices  of  N.  Now  we  prove  a  lemma 
relating  the  rank  of  the  Dixon  matrix  N  with  the  ranks  of 
all  the  matrices  which  are  elements  of  A/i  as  follows. 

Lemma  3.1  If  Q  has  a  common  affine  zero  which  satisfies 
Cj  then 

VXcA/i,  ranA:(X)  =  rank[N) 

Proof  :  Let  xi  =  ci,---,Xn  =  Cn  be  the  common  affine 
zero  of  !F  which  satisfies  C.  Since  each  of  the  S2  variables 
in  the  set  of  si  homogeneous  linear  equations,  stands  for 
a  monomial  in  xi,»**,Xn,  this  solution  for  T  generates  a 
solution  for  5,  say  vi  =  CiyV2  =  6*2, •••,^52  =  C'sj-  So, 


if  5i  X  1  column  vectors  of  the  Dixon 

matrix  N  then, 


CiTUi  +  C2fTl2  +  *  •  *  "h  CiTTli  +  •  •  •  +  Os2'^S2  — 


Let  X  be  any  element  of  Afi .  Then  X  was  obtained  from  N 
by  deleting  some  column  of  N  (say  mi)  which  belonged  to 
nvcol{C).  But  this  means  that  monom(mi)  cannot  vanish 
on  any  solution  which  satisfies  C,  or  in  other  words,  Ct  ^  0. 
So  dividing  the  above  equation  by  Ci  and  rearranging  terms 
on  both  sides,  we  get  the  equation: 


m* 


-Cl  ^  -C2 

-mi  4*  + 


+ 


Ci 
-Ci^i 
Ci 


Ci 

TTli-f  1  +  •  •  •  + 


^  -Q-1 

— m,--i 


Ci 


-c 


^2  „ 


Ci 


In  other  words,  the  column  vector  mi  is  a  linear  combi¬ 
nation  of  the  column  vectors  mi ,  •  •  • ,  mi_i ,  mi+i ,  •  •  • ,  ms2  ♦ 
Hence,  the  dimension  of  the  column  vector  space  spanned 
by  mi,  *••,  mi-1,  mi+i,*'-,m52  is  the  same  as  the  dimen¬ 
sion  of  the  column  vector  space  spanned  by  mi,---,ma2* 
As  a  consequence,  the  rank  of  the  matrix  formed  by  the 
columns  mi,  •  •  • ,  mi-i,  mi+i,  •  •  • ,  ms2  (viz.,  X)  must  be  the 
same  as  that  of  the  matrix  formed  by  all  the  column  vectors 
(viz.,  N),  i.e.,  rank{X)  =  rank{N).  □ 

Now  let’s  get  back  to  the  case  when  the  polynomials  have 
coefficients  from  T.  Let  !F  C  Vlxi ,  •  •  • ,  Xn]  be  a  set  of  n  +  1 
polynomials  and  D  the  Dixon  matrix  of  (with  entries  now 
from  7^).  Let  r  =  rank(^D).  Also,  let  2?i  be  the  set  of  all 
Si  X  (s2  —  1)  matrices  obtained  from  D  by  deleting  a  column 
which  is  an  element  of  nvcol[C)  (just  as  A/i  was  obtained 
from  N  previously). 

Lei  ^  :  {ai,---,am}  Q  be  a  mapping  which  gives 
values  to  the  parameters  from  the  algebraic  closure  of  Q.  By 
abuse  of  notation  (j>{T)j<j){D),  and  are  the  results  of 

substituting  those  vdues  for  the  parameters  in  .F,  D  and  V\ 
respectively  and  removing  aU  zero  rows  and  columns  from 
D  and  elements  of  Vi.  Finally,  let 


It  =  {Y\Y  is  an  r  X  r  nonsingular  suhmatrix  of  D} 

Note  that  H  is  never  an  empty  set  (because  rank{D)  —  r) 
and  moreover,  for  all  TeTl,  det{Y)  ^  0  (because  elements  of 
It  are  nonsingular).  Now  we  have  the  following  theorem. 

Theorem  3,2  IfBXeVi  s.t.  rank{X)  <  rank{D)  then  for 
all  Yelt,  <t>{dei{Y))  vanishes  if  ^{T)  has  a  common  affine 
zero  which  satisfies  C. 

Proof :  Let  0  =  then  it  is  easy  to  see  that  A  =  <^(D) 

is  the  Dixon  matrix  of  Q  and  A/i  =  <^(^^i)  —  {AT}  is  the  set  of 
all  submatrices  of  N  obtained  by  deleting  a  column  which  is 
an  element  of  nvcol(C)  from  N.  So  by  the  previous  lemma, 
if  5  =  (^(F*)  h2is  a  common  affine  zero  which  satisfies  C  then 

VXeVi,  rank(</>(B))  =  rank(<fi(X)).  (1) 

Also,  the  rank  of  any  matrix  with  entries  from  V  cannot 
increase  when  values  from  Q  are  substituted  for  the  param¬ 
eters,  so 


VXeI>ifrank(<^(X))  <  rank(X),  (2) 

If  there  is  an  XeVi  which  satisfies  the  premise  of  the  lemma 
then 


rank(X)  <  rank{D)  =  r. 


(3) 


and  from  equations  (1),  (2)  and  (3)  we  get, 

rank{^{D))  =  rank{(l>{X))  <  rank{X)  <  rank{D)  =  r 
Le.,  rank{(j>{D))  <  r. 

This  means  that  once  the  values  of  parameters  are  substi¬ 
tuted  in  D,  its  rank  becomes  less  than  r  i.e,,  all  the  r  x  r 
submatrices  of  D  (which  are  exactly  the  elements  of  It)  be¬ 
come  singular,  which  means  that  their  determinants  vanish 
at  these  values  of  the  parameters.  □ 

Theorem  3.2  suggests  the  following  algorithm  to  obtain 
Qc.  Check  if  rank{X)  <  rank{D).  If  this  is  true, 

then  the  determinant  of  any  element  of  It  gives  the  required 
polynomial.  But  we  need  to  be  able  to  efficiently  perform 
the  check,  obtain  an  element  of  It  and  finally  compute  its 
determinant.  To  perform  the  check,  the  following  lemma  is 
of  help. 

Let  tD  =  («;i,  •  •  • ,  W32)'^  be  the  S2  x  1  vector  which  satis¬ 
fies  the  matrix  equation  Dw  =  0,  then 

Lemma  3.3  BXeVi  s.t.  rank(X)  <  rank{D)  if  and  only 
if  there  exists  some  0  <  i  <  S2  such  that  the  tn*  =  0  and 
C  =>•  monom{mi)  ^  0. 

Proof :  First  of  all,  let  us  write  the  expanded  version  of  the 
matrix  equation  Dw  =  0: 

miwi  +  m2W2  + - h  =  0 

Only  if:  Let  XeDi  sd.  rank(^X)  <  rank{D).  Let  mi  be 
that  row  of  D  whose  deletion  resulted  in  X.  Then,  by 
definition  of  Pi,  C  =>  monom(mi)  ^  0.  Also,  rank{X)  < 
rank[D)j  implies  that  mi  is  linearly  independent.  This 
means  that  =  0  because  otherwise,  the  above  equation 
can  be  divided  by  Wi  and  then  by  rearranging  terms,  mi 
can  be  expressed  as  a  linear  combination  of  other  columns, 
hence  implying  that  mi  is  linearly  dependent  and  resulting 
in  a  contradiction. 

If:  Assume  that  there  exists  some  0  <  i  <  S2  such  that 
tui  =  0  and  C  monom{mi)  ^  0.  The  latter  assumption 
means  that  the  matrix  obtained  by  deleting  mi  from  D  (say 
X)  must  be  in  Pi.  But  also  since  Wi  —  0,  it  follows  that 
column  mi  is  linearly  independent  of  other  columns  of  P, 
hence  implying  that  rank{X)  <  rank{D).  □ 

Finally,  in  accordance  with  theorem  3.2,  we  would  like 
to  get  the  determinant  of  any  element  in  It  which  will  give 
the  required  necessary  condition  on  the  parameters.  This 
is  achieved  due  to  the  following  lemma.  Let  Drow  have  the 
following  properties: 

1.  Drow  is  row- reduced,  i.e.,  each  column  of  Drow  which 
contains  the  leading  non-zero  entry  of  some  row  has 
all  its  other  entries  0. 

2.  Drow  is  row-equivalent  to  P,  i.e.,  Drow  can  be  obtained 
from  P  by  a  finite  sequence  of  the  following  two  steps: 

(a)  Elimination  step:  Replacement  of  row  of 
P  by  the  i^^  row  plus  d  times  row,  d  is  any 
rational  function  in  the  parameters,  and  i  ^  j. 

(b)  Pivoting  step:  Interchange  two  rows  (say  the 
i^^  and  the  rows)  of  P, 

Notice  that  Drow  can  be  constructed  from  P  by  simple 
Gaussian  elimination  and  many  computer  algebra  systems 
already  support  such  an  operation  (Drow  =  gausselim{D) 
in  MAPLE). 


Lemma  3.4  There  exists  some  YcH  such  that  the  product 
of  all  the  pivot  elements  of  D row  is  equal  to  det{Y). 

Proof :  Assign  to  the  row  of  D,  a  label  Ij.  Now  obtain  the 
matrix  Drow  which  is  row-equivaJent  to  D  by  successively 
applying  a  permutation  of  the  above  two  steps,  except  that 
whenever  the  pivoting  step  is  apphed,  interchange  the  labels 
of  the  rows  too.  In  the  elimination  step,  the  labels  remain 
the  same.  Since  the  rank  of  i)  is  r,  Drow  wiQ  have  r  piv¬ 
ots.  Let  mil ,  •  •  • ,  pivot  columns,  and  />!,•••,  Ijr 

the  labels  of  pivot  rows  of  Drow>  Then  it  is  easy  to  show 
that  the  product  of  all  pivot  elements  of  Drow  is  the  deter¬ 
minant  of  the  r  X  r  submatrix  (say  Y)  oi  D  obtained  by 
the  rows  labelled  by  ,  •  *  • ,  Ijr  S'^d  the  columns  indexed 
by  mil ,  •  •  • ,  m,,.  Since  the  product  of  pivot  elements 

is  not  identically  zero,  Y  is  not  singular  so  it  is  in  %  by 
definition.  □ 

Based  on  theorem  3.2  and  lemmas  3.3  and  3.4,  we  can 
get  the  following  algorithm.  Assume  that  the  set  of  n  -f  1 
polynomials  T  C  7^[a;i, •  •  • , Xn]  and  the  constraints  C  are 
given.  The  following  algorithm  checks  if  the  precondition 
in  theorem  3.2  is  true  in  which  case  it  returns  gc-  If  the 
precondition  of  theorem  3.2  is  not  true,  then  this  heuristic 
fails. 

1.  Set  up  the  si  x  S2  Dixon  matrix  (D)  of 

2.  Solve  the  matrix  equation  Dw  =  0.  Let  w  =  (u;i,  •  •  • , 
W32)  denote  the  solution. 

3.  Find  out  if  there  exists  an  Wi  in  w  such  that  Wi  =  0 
and  also  C  =>  monom{mi)  =  0.  If  such  an  Wi  exists 
then 

(a)  Compute  D  row  • 

(b)  Return  the  product  of  all  the  pivots  of  Drow^ 

4.  Else,  this  heuristic  fails. 

Theorem  3.5  If  the  Dixon  matrix  of  a  set  of  polynomials 
satisfies  the  precondition  of  theorem  3.2,  then  the  above  aU 
gorithm  computes  qc- 

Proof :  Follows  directly  from  theorem  3.2  and  lemmas  3.3 
and  3.4.  □ 

There  are  two  points  to  be  noted  here.  First,  though 
there  exist  examples  where  the  condition  in  step  (3)  is  not 
true,  they  are  rare.  In  all  the  examples  from  geometry  that 
we  tried,  D  was  singular  many  times,  but  the  condition  of 
step  (3)  always  held.  Secondly,  once  the  polynomial  in  the 
parameters  is  obtained  from  the  algorithm,  it  can  usually 
be  factored.  Since  the  vanishing  of  this  polynomial  is  only 
a  necessary  condition,  occasionally  it  will  be  the  case  that 
some  of  the  factors  do  not  vanish  on  any  of  the  solutions. 
These  factors  can  be  removed  to  obtain  a  smaller  qc>  This  is 
done  manually  as  suggested  by  specific  algebraic  or  geomet¬ 
ric  problem.  Our  experience  has  been  that  it  is  usually  easy 
to  identify  such  extraneous  factors,  but  automatic  method 
to  accomplish  this  need  to  be  further  explored.  In  the  next 
section,  we  discuss  some  examples. 

4  Geometric  Reasoning  using  Dixon  resultants 

We  implemented  the  proposed  algorithm  on  a  SPARCsta- 
tion  10  in  MAPLE  using  the  primitive  operations  such  as 
linsolve,  gausselim,  etc.,  available  in  its  linear  algebra  pack¬ 
age.  No  attempt  was  made  to  optimize  Maple  code.  We 


beheve  the  algorithm  can  be  made  much  faster  using  inter¬ 
polation  techniques  for  computing  determinants  as  reported 
by  Canny  and  Manocha  in  [13]. 

Two  geometric  identities  derived  using  the  algorithm  are 
presented  in  this  section.  A  few  more  identities  will  be  pre¬ 
sented  without  details  in  a  later  section. 

Example  1:  Side-Bisector  Relation 

Let  ABC  be  a  triangle  as  in  the  following  figure,  a,  b  and 
c  the  length  of  the  sides  BC,  AC  and  AB,  ai  and  Ue  the 
length  of  internal  (AD)  and  external  (AE)  bisectors  of  angle 
A,  and  he  the  length  of  the  external  angle  bisector  (BF)  of 
angle  B.  The  objective  is  to  express  a  in  terms  of  Uj,  Ue  and 
he. 

This  problem  was  first  posed  by  Heymann  in  [11].  He 
wanted  to  determine  if,  given  general  values  of  the  three  an¬ 
gle  bisectors,  can  one  draw  the  triangle  using  only  a  compass 
and  a  ruler?  This  is  possible  if  and  only  if  the  expression  in¬ 
volving  a,  a,  ,  tte  and  be  is  of  degree  2’^  in  a  for  some  integral 
m  (see  corollary  2  of  Theorem  5.4.1  in  [10]).  This  example 
was  again  posed  in  [8]  where  they  solved  this  problem,  and 
we  have  presented  it  exactly  in  the  same  way. 


It  is  a  standard  result  of  Euclidean  geometry  that: 

cb(c  -\-b  —  a){c  +  6  +  a) 

cb{a  -H  6  —  c)(c  —  6  +  a) 

ac{a  -f  6  —  c)(c  -\-b  —  a) 

(c  -  0)2 

Hence,  it  is  easy  to  express  the  length  of  the  bisectors  in 
terms  of  the  length  of  the  sides.  The  challenge  is  to  express 
the  length  of  the  sides  in  terms  of  the  length  of  these  three 
bisectors.  In  principle,  this  is  a  simple  elimination  problem, 
i.e.,  if  we  can  eliminate  the  variables  b  and  c  from  any  two 
equations,  then  we  can  plug  the  expressions  for  these  two 
variables  into  the  third  equation  to  obtain  an  expression  for 
a  in  terms  of  only  ai,  Ue  and  be. 

This  can  be  achieved  by  computing  the  resultant  of  these 
three  polynomials  w.r.t.  the  two  variables,  b  and  c.  To  this 
effect,  let  us  first  represent  these  equations  as  polynomials 
by  transforming  them  to  the  following: 

qi  =  (6 -h  c)^  —  c5(c -h  6  a)(c  4- 6 -b  a) 

q2  =  al{c  —  b)^  —  ch{a b  —  c){c  —  b a) 

q^  =  —  a)^  —  ac(a -j- 6  —  c)(c -i- 6  —  a) 

The  objective  is  to  eliminate  b  and  c  without  any  constraints 

on  them.  We  discuss  the  trace  of  our  algorithm  on  this  set 
in  the  next  paragraph. 


First  the  Dixon  polynomial  was  computed,  followed  by 
the  Dixon  matrix  (M)  of  this  set  which  turned  out  to  be  13  x 
14.  Solving  the  matrix  equation  Mx  =  0  resulted  in  a  vector 
whose  component  corresponding  to  the  monomial  =  1 
is  0  and  this  took  198  seconds.  Hence  the  condition  in  step 
(2)  of  the  algorithm  is  true.  Now  Gaussian  elimination  was 
performed.  Gaussian  elimination  took  284  seconds.  The 
product  of  the  pivot  elements  of  the  matrix  after  Gaussian 
elimination  is  the  necessary  condition  that  a,aj,ae  and  he 
must  satisfy  for  any  triangle  ABC.  The  total  computation 
took  501  seconds.  After  removing  extraneous  factors,  the 
smallest  necessary  condition  was  obtained  and  it  contains 
330  terms.  The  result  is  the  same  as  the  one  reported  in  [8]. 
Since  the  expression  is  of  degree  20  in  a,  the  triangle  cannot 
be  constructed  using  a  compass  and  a  ruler,  given  a*,  Ue  and 
be  (see  corollary  2  of  Theorem  5.4.1  in  [10]). 

Gao  and  Wang  in  [8]  reported  that  they  solved  the  prob¬ 
lem  by  successively  eliminating  h  and  c  using  a  combination 
of  pseudo  division  and  Sylvester’s  resultant  computation. 
They  took  about  19  hours  on  their  implementation  in  lisp 
on  a  SUN  4  workstation.  Our  algorithm  took  less  than  17 
minutes  on  a  SUN  4,  and  less  than  9  minutes  on  a  SPARC- 
station  10.  □ 

Example  2:  Maximum  Volume  of  a  Tetrahedron 

Consider  a  tetrahedron  as  below: 


A 


The  objective  is  to  determine  the  maximum  volume  that 
a  tetrahedron  can  have,  given  that  the  squares  of  the  areas 
of  the  four  faces,  ABC^  ACDy  BCD  and  ABD  are  a,  b,  c 
and  d  respectively. 

It  has  been  established  in  [9]  that  if  there  exist  parame¬ 
ters  X,  y,  z  and  w  which  satisfy  the  following  four  equations: 

yz  zw  wy  —  a  =  0 

zx  xw  -f  i  =  0 

wx  xy  yw  —  c  =  0 

xy  yz  A  zx  —  d  =  0 

then  the  tetrahedron  is  an  orthocentric  tetrahedron  and 
hence  the  one  with  the  maximum  volume  for  these  surface 
areas.  Moreover,  the  square  of  the  volume  (T)  of  this  tetra¬ 
hedron  is: 


T  =  +  wxy) 

How  does  one  get  the  value  of  T  purely  in  terms  of  a,  6, 
c  and  d?  This  problem  translates  to  eliminating  x,  y^  z  and 
w  from  the  above  mentioned  five  equations,  i.e.,  computing 
the  resultant  of  the  following  five  polynomials: 

= 


q2  =  zx  A  xw  -h  wz  —  b 

g3  =  wx  A  xy  A  yw  —  c 

g4  =  xy  A  yz  A  zx  —  d 

g5  =  2{xyz  A  yzw  A  zwx  A  wxy)  —  9T 

with  respect  to  x^y^z  and  w.  It  turns  out  that  any  of  these 
variables  x^y,z  and  w  being  zero  is  a  degeneracy,  hence  we 
can  work  under  the  constraint  C  =  x  0 Ay  ^  OAz Aw  ^  0. 
We  now  discuss  the  trace  of  the  program. 

The  Dixon  matrix  was  set  up  and  it  was  found  to  be 
of  dimension  16  x  16.  Solving  the  equation  Mx  =  0  took 
10  seconds  and  it  was  found  that  the  component  of  x  cor¬ 
responding  to  the  monomial  x  was  0.  So  the  condition  of 
step  3  is  satisfied  and  Gaussian  elimination  was  performed 
which  took  84  seconds.  The  necessary  condition  for  the  ex¬ 
istence  of  an  affine  zero  satisfying  C  was  computed  as  the 
product  of  the  pivots  entries  of  the  matrix  which  was  ob¬ 
tained  after  Gaussian  elimination.  It  was  found  that  there 
was  no  extraneous  factor  in  the  necessary  condition,  hence 
it  is  the  smallest  necessary  condition  and  it  contained  434 
terms.  The  total  time  taken  was  110  seconds.  □ 


5  Advantages  of  Dixon  resultants 

In  our  experimentation  with  this  technique,  we  found  that 
this  technique  is  faster  than  successive  Sylvester  resultant 
computation  for  computing  resultants  for  both  cases  -  (1) 
two  polynomials  and  single  variable  to  be  eliminated,  and 
(2)  more  than  two  polynomials  with  more  than  one  variable 
to  be  eliminated.  In  the  case  when  the  Dixon  matrix  is 
singular,  this  technique  was  also  found  to  be  faster  than 
perturbation  techniques.  The  reasons  are  as  follows. 

1.  Smaller  Determinant  :  In  the  one  variable  case, 
it  was  seen  in  section  2.1  that  the  resultant  is  the  de¬ 
terminant  of  an  max(m,n)  x  max(m,n)  Dixon  ma¬ 
trix  (where  m  and  n  are  the  degrees  of  the  two  poly¬ 
nomials).  This  is  a  much  smaller  matrix  than  the 
(mAn)  X  (m-hn)  matrix  that  one  gets  using  Sylvester’s 
formulation.  The  smaller  matrix  leads  to  reduced 
time. 

2.  Uniform  Approach  ;  When  more  than  one  vari¬ 
ables  need  to  be  eliminated,  Dixon’s  method  works 
better  because  all  polynomials  and  variables  are  treat¬ 
ed  uniformly.  The  method  particularly  works  best 
in  the  case  of  problems  (or  polynomials)  which  are 
symmetric  in  the  variables  which  need  to  be  elimi¬ 
nated.  One  possible  reason  is  that  Dixon’s  method 
adopts  a  symmetric  approach  and  a  single  Dixon  ma¬ 
trix  is  set  up  for  all  the  polynomials  together.  This 
is  in  contrast  to  the  successive  resultant  computation 
techniques  which  eliminate  variables  one  by  one,  and 
compute  numerous  intermediate  resultants  before  suc¬ 
cessively  computing  the  resultant  of  the  whole  set. 
This  ordering  among  the  variables  breaks  the  sym¬ 
metry  of  the  problem  because  of  which  huge  interme¬ 
diate  polynomials  before  the  (relatively  small)  resul¬ 
tant  is  obtained.  This  usually  turns  out  to  be  costly. 
Dixon’s  method  avoids  such  intermediate  polynomials 
and  hence  is  much  faster  and  also  saves  space.  This 
is  the  main  reason  that  our  algorithm  is  able  to  prove 
theorems  and  derive  identities  substantially  faster  than 
other  methods. 


yz  A  zw  A  wy  —  a 


3.  Faster  than  Perturbation  ;  Perturbation  usually 
is  an  expensive  operation  as  opposed  to  the  approach 
presented  in  this  paper.  Dixon  resultants  (and  in  fact 
most  resultant  formulations  in  general)  are  sensitive 
to  the  number  of  variables,  the  degree  of  each  polyno- 
mial  and  also  the  distribution  of  variables  (a  variable 
occurring  in  a  few  polynomials  is  better  than  it  occur¬ 
ring  in  a  lot  of  polynomials).  If  the  Dixon  matrix  is 
singular,  then  a  generic  perturbation  (a  la  [3])  can  also 
be  performed  to  obtain  a  projection  operator  which  is 
not  identically  zero.  However,  the  perturbation  vari¬ 
able  is  introduced  at  the  highest  possible  degree,  in 
every  polynomial.  Our  method,  on  the  other  hand, 
avoids  perturbation  in  many  cases.  In  the  proposed 
method,  the  projection  operator  is  extracted  from  the 
Dixon  matrix  of  the  original  system,  without  any  ex¬ 
tra  computation.  This  results  in  a  method  which  is 
more  efficient  than  perturbations  and  this  is  reflected 
in  all  the  examples  of  this  paper. 

4.  Automatic  Method  :  Methods  based  on  variable 
orderings  or  which  employ  successive  elimination  also 
suffer  with  human  intervention.  A  good  ordering  must 
be  specified  or  the  order  in  which  elimination  is  per¬ 
formed  must  be  given.  This  seems  unavoidable  since 
the  time  taken  by  successive  elimination  techniques 
is  sensitive  to  the  variable  ordering  used.  Dixon’s 
method,  on  the  other  hand,  does  not  eliminate  the 
variables  in  any  particular  order.  Instead,  this  method 
directly  computes  the  resultant;  thus  being  fully  au¬ 
tomatic.  In  fact,  never  once  did  we  have  to  interfere 
during  the  proofs  of  the  geometry  theorems  and  alge¬ 
braic  identities  mentioned  in  this  paper. 

6  Empirical  results 

We  present  more  examples,  and  give  the  following  charac¬ 
teristics  about  each  example  in  table  1: 

(a)  Sing  :  Whether  the  Dixon  matrix  was  singular  (s)  or 
nonsingular  (n). 

(b)  Terms  :  Number  of  terms  in  the  smallest  necessary 
condition  for  affine  zeros. 

(c)  Dix  :  Time  taken  by  an  implementation  of  our  algo¬ 
rithm  in  MAPLE  on  a  SPARCstation  10. 

(d)  Grob  :  Time  taken  by  MACAULAY  system  [1]  on 
a  SPARCstation  10  for  computing  the  Grobner  basis  using 
block  ordering  where  the  first  block  contains  all  the  vari¬ 
ables  and  the  second  all  the  parameters.  Variables  among 
the  same  block  are  degree  ordered,  and  across  the  blocks, 
they  are  lexicographically  ordered. 

(e)  Mac/GCP  :  Time  taken  to  compute  the  Macaulay 
resultant  (GCP  when  Macaulay  matrix  is  singular  [3])  using 
MAPLE  on  a  SPARCstation  10. 

A  (*)  indicates  that  either  the  program  went  on  for  more 
than  a  day,  or  it  ran  out  of  space. 

We  tried  successive  resultant  computation  using  the  pre¬ 
existing  implementations  of  Sylvester’s  resultants  and  Be- 
zout’s  resultants  in  MAPLE  for  various  variable  orderings, 
but  the  computation  ran  out  of  memory  for  every  exam¬ 
ple.  We  also  tried  to  compute  the  lexicographic  Grobner 


Example 

Sing 

Terms 

mm 

Grob 

Mac/GCP 

1 

s 

330 

501s 

♦ 

* 

2 

s 

434 

110s 

585s 

* 

3 

n 

2424 

9.6s 

♦ 

39s 

4 

s 

990 

■Egiai 

15429s 

♦ 

5 

n 

781 

2207s 

850s 

Table  1:  Comparision  of  Various  Methods 


bases  to  obtain  the  conditions  for  the  common  zeros.  Nei¬ 
ther  MAPLE,  nor  MACAULAY  (Bayer  and  Stillman  [1]) 
could  compute  the  Grobner  basis  of  any  of  the  examples  in 
this  paper.  For  lexicographic  ordering  among  all  the  vari¬ 
ables,  MACAULAY  ran  for  upto  a  day  on  some  of  these  ex¬ 
amples  and  then  ran  out  of  memory.  When  block  ordering 
(as  described  in  (d)  above)  was  used,  MACAULAY  success¬ 
fully  terminated  after  a  long  time  on  three  of  the  examples 
(2,  4  and  5),  but  the  remaining  two  (1  and  3)  went  on  for  a 
day  and  still  did  not  terminate. 

Attempt  was  made  to  compute  the  Macaulay  resultant 
(GCP  in  case  the  Macaulay  matrix  was  singular  [3])  on 
MAPLE,  but  none  of  the  GCP  computations  (examples  1,  2 
and  4)  ever  finished.  In  all  those  cases,  the  computation  ran 
for  upto  a  day  before  running  out  of  space.  The  resultant 
for  the  examples  in  which  GCP  was  not  required  (examples 
3  and  5)  successfully  terminated,  but  took  longer  time  than 
our  method. 

For  all  the  examples  in  which  the  Dixon  matrix  is  singu¬ 
lar,  perturbation  techniques  ([3])  were  also  tried.  One  can 
perturb  the  polynomials  so  that  they  become  ndegree  and 
the  Dixon  matrix  is  no  longer  singular.  This  technique  also 
failed  for  each  example  in  this  paper  because  the  Dixon  ma¬ 
trix  blows  up  after  perturbation,  hence  resulting  in  a  sub¬ 
stantially  larger  Dixon  matrix  with  larger  polynomial  en¬ 
tries.  The  determinant  computation  for  these  matrices  ran 
out  of  space  for  all  the  examples  on  MAPLE. 

For  each  example,  the  variables  are  one  or  more  of  a;,  y 
and  z.  The  polynomials  whose  resultant  needs  to  be  com¬ 
puted  are  two  or  more  of  g^’s.  The  numbering  of  the  exam¬ 
ples  follows  after  the  previous  two  examples. 

Example  3;  Expression  for  the  distance  of  the  intersection 
of  two  general  conics  from  the  origin. 

gi  =  (i2xy azy^  a^x  a^y  ^ 

q2  =  h\x^  -\-h2xy -^rhzy^  -\-h\x -\-hzy he 

qz  =  x^  -^-y^  —T 

Example  4:  Conditions  for  perpendicular  intersection  of  a 
general  conic  and  a  general  circle. 

gi  =  a\x’^  a2xy -i- azy'^  a^x aty ae 

q2  =  x^  y^  h\x  ^  h2y hz 
_  dqi  dq2  ^  dqi  dq2 
dx  dx  dy  dy 

Example  5:  Conditions  for  the  following  four  equations  to 
have  a  common  solution. 

qi  =  aix  a2y  azz  a4^ 
q2  =  bixy -\-h2yz -\-hzzx 


93  =  Cixyz-^  C2 

54  =  dixyz d2X dsy d^z 

7  Conclusion 

Dixon’s  method  for  simultaneously  eliminating  several  va¬ 
riables  is  discussed.  The  method  is  extended  for  the  case 
when  Dixon  matrix  is  singular.  An  algorithm  is  given  to 
extract  a  nonzero  projection  operator  for  a  subclass  of  the 
systems  of  multivariate  polynomials  from  its  singular  Dixon 
matrix.  This  algorithm  avoids  perturbation. 

A  great  deal  of  work  needs  to  be  done  for  further  in¬ 
vestigating  Dixon’s  method.  In  particular,  there  is  a  need 
to  further  understand  Dixon’s  method  in  the  general  case. 
Determinant  computations  of  matrices  with  polynomial  en¬ 
tries  are  a  major  bottleneck  in  the  method.  Fast  methods 
based  on  interpolation,  similar  to  those  in  [13],  need  to  be 
investigated  in  order  to  make  Dixon’s  method  more  widely 
applicable. 
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Abstract 

We  demonstrate  that  viewpoint-invariant  representations  can  be  obtained 
from  images  for  a  useful  class  of  3D  smooth  object.  The  class  of  surfaces  are 
those  generated  as  the  envelope  of  a  sphere  of  varying  radius  swept  along  an 
axis.  This  class  includes  canal  surfaces  and  surfaces  of  revolution. 

The  representations  are  computed,  using  only  image  information,  from 
the  symmetry  set  of  the  object’s  outline.  They  are  viewpoint-invariant  under 
weak- perspective  imaging,  and  quasi-invariant  to  an  excellent  approximation 
under  perspective  imaging.  To  this  approximation,  the  planar  axis  of  a  canal 
surface  is  recovered  up  to  an  affine  ambiguity  from  perspective  images. 

Examples  are  given  of  the  representations  obtained  from  real  images,  which 
demonstrate  stability  and  object  discrimination,  for  both  canal  surfaces  and 
surfaces  of  revolution.  Finally,  the  representations  are  used  as  the  basis  for  a 
model-based  object  recognition  system. 


1  Introduction 

The  aim  of  this  work  is  to  extract  viewpoint-invariant  descriptions  of  3D  smooth 
objects  from  single  images.  These  descriptions  will  be  used  as  shape  descriptors  in  a 
model-based  recognition  system.  For  a  completely  general  object,  and  with  no  other 
information,  it  is  not  possible  to  recover  shape  or  invariant  descriptions  from  a  single 
image  (see  for  example  [7,  9,  16]  for  3D  point  sets).  However,  if  the  3D  structure  is 
constrained^  then  invariant  descriptions  can  be  obtained.  Here  we  consider  surfaces 
belonging  to  a  particular  class  of  generalized  cylinder  (GC)  [1].  The  class  consists 
of  surfaces  generated  as  the  envelope  of  a  sphere  of  varying  radius  swept  along 
the  cylinder  axis  (which  need  not  be  straight).  Examples  include  pipes  or  tubes 
(‘canal  surfaces’)  where  the  sphere  radius  is  constant,  and  surfaces  of  revolution, 
where  the  axis  is  a  line.  This  class  generates  a  large  number  of  commonly  occurring 
manufactured  objects. 

The  key  idea  here  is  that  since  the  ‘envelope  of  the  profile^  is  the  profile  of  the 
envelope’  [25]  the  image  projection  of  this  class  of  surface  is  an  envelope  of  circles. 

^The  image  outline  of  the  surface. 
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By  inverting  this  process — recovering  circles  from  the  profile — the  projection  of  the 
axis  and  the  scaling  function  can  be  extracted  from  the  image. 

Previous  work  on  extracting  GC’s  from  images  has  had  different  or  more  limited 
goals:  first,  rather  than  perspective,  the  weak  perspective  imaging  approximation 
has  generally  been  used  (see  [20,  30],  where  earlier  references  are  given);  second,  the 
goal  has  beemreconstruction  rather  than  representation,  and  this  requires  a  reference 
cross-section  to  be  visible  in  the  image  [28];  third,  those  methods  that  have  produced 
invariants  under  perspective  [11]  have  only  utilized  a  number  of  points  on  the  image 
outline — not  the  entire  curve.  In  this  paper,  in  common  with  the  above,  only  the 
profile  is  used;  no  use  is  made  of  surface  markings  or  texture. 

The  tool  employed  here  is  the  symmetry  set  [5],  studied  by  Giblin  &:  Brassett  [12], 
which  is  the  locus  of  centres  of  circles  bitangent  to  a  plane  curve.  Previous  symmetry 
analysis  has  largely  concentrated  on  extracting  bilateral  (reflection)  or  rotational 
symmetries  from  images  of  planar  objects,  or  from  a  single  silhouette  of  3D  objects, 
viewed  in  a  fronto-parallel  plane  [3,  4,  26].  The  methods  developed  for  those  cases 
can  not  be  applied  if  the  viewpoint  is  not  fronto-parallel  since,  for  a  planar  object, 
reflectional  symmetries  are  then  skewed  by  imaging  [13,  17,  29].  For  a  3D  smooth 
object  additional  distortions  occur — the  contour  generator^  moves  on  the  surface 
as  viewing  position  changes.  In  this  case  it  is  not  a  fixed  space  curve  which  is 
projected,  and  the  image  profile  can  change  radically  with  viewpoint,  defeating  any 
simple  application  of  skewed  symmetry. 


2  Theory 

In  the  following  we  consider  two  types  of  image  projection:  perspective  and  weak 
perspective.  In  both  cases  the  camera  aspect  ratio  must  be  known,  though  no  other 
intrinsic  paxameters  are  required.  However,  much  of  the  construction  uses  only  affine 
or  projective  properties,  and  this  is  made  explicit. 

2.1  Weak  Perspective 

Consider  sweeping  a  sphere  of  varying  radius  along  the  axis  curve:  the  resulting 
surface  is  the  envelope  of  the  swept  sphere.  At  each  point  the  profile  of  the  sphere  is 
a  circle,  and  the  profile  of  the  surface  is  the  envelope  of  the  circles.  Under  affine  pro¬ 
jection  the  centre  of  the  sphere  projects  to  the  centre  of  the  circle.  Consequently,  the 
circle’s  centre  sweeps  out  the  projection  of  the  axis.  This  construction  is  illustrated 
schematically  in  figure  1. 

Now  consider  two  such  circles:  as  the  scaling  arising  from  weak  perspective  is  the 
same  in  both  cases,  the  ratio  of  circle  radii  equals  the  ratio  of  radii  of  the  generating 
spheres.  The  usefulness  of  these  results  is  that  the  circles  can  be  recovered  from  the 
profile  by  constructing  the  symmetry  set,  the  locus  of  centres  of  circles  bitangent  to 
the  profile.  To  summarize: 

^The  curve  on  the  surface  which  projects  to  the  image  profile. 
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Figure  1:  (a)  Under  weak  perspective  projection  the  profile  (image  outline)  of  a 
sphere  is  a  circle,  and  the  sphere  centre  is  projected  to  the  circle  centre,  (b)  The 
surface  is  generated  as  the  envelope  of  a  sphere  swept  along  an  axis  a.  (c)  The 
surface  profile  is  generated  as  the  envelope  of  a  circle  .swept  along  a,  where  the 
image  curve  cr  is  the  projection  of  the  axis  curve  a.  The  figure  illustrates  the  case 
for  a  constant  radius  sphere  —  a  canal  surface.  The  profile  is  still  generated  (locally) 
as  the  envelope  of  circles  when  the  sphere  radius  varies. 
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Given  a  weak  perspective  image  of  a  surface  generated  (locally)  as  the  envelope  of  a 
sphere  of  varying  radius  R{S),  with  centres  on  a  plane  curve  q:(5')  (the  axis);  the 
symmetry  set,  computed  from  the  image  profile,  has  the  following  properties: 

1.  The  contact  of  image  circles  with  the  profile  identifies  corresponding  points  on 
the  surface,  i.e.  points  which  lie  on  the  same  circular  cross-section. 

2.  The  curve  covered  by  circle  centres,  cr(s),  is  within  a  plane  affine  transforma¬ 
tion  of  the  curve  Oi{S). 

3.  The  scaling  function  for  radii  of  image  circles,  r{s),  equals  the  scaling  function 
for  radii  of  the  generating  spheres  R{S)  for  corresponding  points,  a{S)  and 
cr(s),  up  to  an  overall  scale  ambiguity. 

Note,  only  the  second  item  requires  that  the  axis  be  planar.  The  other  properties 
hold  if  the  axis  is  a  space  curve. 

The  curves  {(r(s),r(s)}  are  a  viewpoint-invariant  representation.  a{S)  is  deter¬ 
mined  up  to  an  affine  ambiguity,  and  consequently  affine  invariants  of  a(5)  are  equal 
to  affine  invariants  of  (7(s),  and  these  are  viewpoint-invariant.  R{S)  is  determined 
up  to  scale.  The  parametrization  of  (t(s)  is  described  in  section  2.4. 

2.2  Perspective 

Under  perspective  projection  the  profile  of  a  sphere  is  an  ellipse,  and  the  centre  of 
the  sphere  does  not  project  to  the  ellipse  centre.  However,  in  practice,  for  finite 
image  planes  this  effect  is  extremely  small:  the  aspect  ratio  of  a  sphere’s  profile  is  at 
worst  0.94,  and  its  centre  is  displaced  at  most  by  1.2%  of  its  diameter,  even  at  the 
border  of  a  wide-angle  lens  with  a  46°  field  of  view.  This  is  an  example  of  a  quasi¬ 
invariant  [2].  Thus,  the  symmetry  set  will  still  be  an  excellent  approximation  of  the 
projected  cixis.  However,  the  transformation  between  the  cixis  and  image  curves  will 
be  projective,  rather  than  affine.  To  summarize: 

Given  a  perspective  image  of  a  surface  generated  (locally)  as  the  envelope  of  a  sphere 
of  varying  radius  R{S),  with  centres  on  a  plane  curve  Oi{S)  (the  axis);  the  symmetry 
set,  computed  from  the  image  profile,  (approximately)  has  the  following  properties: 

1.  The  contact  of  image  circles  with  the  profile  identifies  corresponding  points  on 
the  surface  (i.e.  points  on  the  same  circular  cross-section). 

2.  The  curve  covered  by  circle  centres,  is  within  a  plane  projective  transfor¬ 

mation  of  the  curve  (x{S). 

The  curve  {cr(s)}  is  a  viewpoint-invariant  representation.  o:{S)  is  determined 
up  to  a  projective  ambiguity,  and  consequently  projective  invariants  of  (t(s)  are 
viewpoint-invariant.  Furthermore,  it  is  shown  in  section  4.2  that  for  canal  surfaces 
with  at  least  two  inflections,  Q:(*S’)  is  determined  up  to  an  affine  ambiguity,  even 
under  perspective  distortion. 
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Figure  2:  The  ribbon  {(5(5),  ^'(5)}  is  generated  by  sweeping  a  line  of  varying  length, 
such  that  the  linens  mid-point  lies  on  the  axis  a,  and  the  line  is  orthogonal  to  the 
axis.  In  general  the  symmetry  set  of  this  type  of  ribbon  will  not  coincide  with  the 
axis.  For  example,  the  centre  c  of  the  symmetry  set  at  p,  p'  does  not  lie  on  a. 

2.3  Surface  Class  and  Special  GCs 

The  surface  class,  5,  considered  here  consists  of  those  surfaces  generated  as  the 
envelope  of  a  sphere  swept  along  a  plane  curve.  For  these  surfaces,  by  construction, 
a  sphere  with  a  circle  of  points  tangent  to  the  surface  is  one  of  the  generating  spheres, 
and  will  have  its  centre  on  the  axis.  Consequently,  the  projected  axis  and  profile 
symmetry  set  coincide. 

Generalized  cylinders  are  often  defined  in  terms  of  sweeping  a  planar  cross-section 
along  an  axis  [1].  However,  if  a  surface  is  generated  by  sweeping  circles  of  varying 
radius  orthogonal  to  a  planar  axis,  then  the  centre  of  a  sphere  with  a  circle  of 
points  tangent  to  the  surface  will  not,  in  general,  coincide  with  the  axis.  To  see 
this,  consider  the  intersection  of  such  a  surface  with  the  plane  of  the  axis.  The 
intersection  is  a  pair  of  plane  curves,  {^(s),^'(5)}  which  are  generated  by  the  ends 
of  a  swept  line.  The  line  is  orthogonal  to  the  axis,  with  its  mid-point  on  the  axis,  and 
the  line  length  varies  according  to  the  radius  of  the  circle.  In  general,  the  symmetry 
set  of  {^(s),^'(5)}  will  not  coincide  with  the  axis.  Consequently,  the  centre  of  a 
sphere  with  a  circle  of  points  tangent  to  the  surface  can  not  lie  on  the  axis.  An 
example  is  shown  in  figure  2.  A  full  discussion  of  the  2D  case  is  given  by  Ponce  [21] 
where  the  line  and  circle  sets  are  termed  Brooks  and  Blum  ribbons  respectively. 

Two  special  cases  of  surfaces  in  S,  which  are  defined  by  additional  constraints, 
are  considered  throughout  the  rest  of  the  paper.  In  the  following  we  list  a  number 
of  properties  for  these  cases  which  do  not  hold  for  an  unconstrained  member  of  5. 
These  properties  are  used  during  the  image  processing  (grouping  and  symmetry  set 
extraction),  and  in  the  construction  of  a  canonical  frame.  Both  these  special  cases 
can  also  be  generated  by  sweeping  circles. 

2.3.1  Canal  surfaces 

Here  there  is  an  additional  constraint  which  is  that  the  scaling  function  is  a  constant. 
These  surfaces  are  also  called  Circular  Planar  Right  Generalized  Cylinders  with 
constant  cross-section. 
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Figure  3:  For  a  canal  surface  with  a  planar  axis,  inflections  in  the  profile  occur  in 
pairs  for  each  inflection  of  the  axis.  The  intersection  of  a  pair  of  inflection  tangents 
determines  the  vanishing  point  of  the  tangent  line  at  the  axis  inflection.  Two  such 
vanishing  points  determine  the  vanishing  line,  L,  of  the  plane  of  the  axis.  This  line 
can  be  used  to  group  further  profile  inflection  tangents  (i.e.  the  pairs  of  tangents 
must  intersect  on  this  line). 

1.  Under  weak  perspective  projection,  the  two  sides  of  the  prof^e  are  parallel 
curves  of  the  symmetry  set  (the  projection  of  the  axis).  This  follows  directly 
from  the  profile  curves  being  the  envelope  of  constant-radius  circles  swept 
along  the  symmetry  set. 

2.  Inflections  in  the  axis  produce  inflection  pairs  on  the  profile.  Tangents  at 
corresponding  profile  inflections  (on  each  side  of  the  profile)  intersect  on  the 
vanishing  line  of  the  axis  curve’s  plane.  Two  such  intersections  determine 
the  vanishing  line,  and  hence  the  extraction  of  affine  curve  measurements;  see 
figure  3.  This  relationship  is  exact — it  is  not  a  quasi-invariant.  Note:  this 
constraint  also  applies  to  line  segments  on  the  profile  (a  line  is  a  ‘degenerate’ 
inflection),  so  can  be  applied  to  a  piecewise  linear  axis. 

2.3.2  Surfaces  of  revolution 

Here  the  additional  constraint  is  that  the  axis  is  straight.  Consequently,  the  scaling 
function  is  the  most  informative  component  of  the  viewpoint-invariant  representa¬ 
tion.  A  surface  of  revolution  is  a  special  case  of  a  Straight  Homogeneous  Generalized 
Cylinder. 

1.  Tangents  at  corresponding  points  on  the  two  sides  of  the  profile  (i.e.  images  of 
points  on  the  same  circular  cross-section)  intersect  on  the  projected  symmetry 
axis  [20]. 

2.  In  particular,  corresponding  profile  bitangents  intersect  at  an  image  point  p, 
which  is  the  image  of  P,  the  point  of  intersection  between  the  object  axis  and 
planes  bitangent  to  the  surface.  P  is  viewpoint-independent  [11],  and  p  can 
be  identified  in  any  image. 

3.  The  profile  can  be  separated  into  two  ‘sides’,  which  are  related  by  a  planar 
harmonic  homology  [17,  18]  (i.e.  a  projective  transformation  T  such  that  = 


6 


Figure  4:  Three  of  the  fitted  circles  for  a  surface  of  revolution.  Note  that  the  centre’s 
position  on  the  axis  of  symmetry,  s,  is  not  a  monotonic  function  of  distance  along 
the  profile,  p. 


I  [27]).  T  provides  point  to  point  correspondence  between  the  sides  of  the 
profile,  thus  greatly  simplifying  the  processing.  Provided  the  field  of  view 
is  not  too  large,  T  is  approximately  affine  [15];  this  is  another  example  of  a 
quasi-invariant. 


2.4  Curve  Parametrization 

Consider  the  motion  of  circle  centres  for  a  point  progressing  along  the  profile.  If 
the  circle  radius  increases,  then  the  centre  can  reverse  direction  and  ‘double  back’ 
on  itself.  This  does  not  occur  for  canal  surfaces  (constant  radius)  but  will  occur 
in  general,  and  is  clearly  demonstrated  in  figure  4.  Correspondingly,  the  centres 
of  the  spheres  generating  the  surface  (a  3D  symmetry  set)  double  back.  There  are 
thus  two  parametrizations  to  be  considered:  first,  the  parametrization  of  the  axis, 
5,  (which  is  a  monotonic  function  of  distance  along  the  axis  of,  for  example,  a 
surface  of  revolution);  and  second,  the  parametrization  of  the  symmetry  set.  This 
latter  parametrization  is  not  a  monotonic  function  of  5,  in  general.  The  figures  give 
results  parametrized  by  5.  If,  instead,  symmetry  set  parametrization  were  used,  it 
would  avoid  the  doubling  back  in  figures  9  and  10,  though  there  would  still  be  cusps 
in  the  graph. 

2.5  Self-Occlusion 

The  discussion  to  this  point  has  not  considered  cases  where  the  surface  occludes 
itself  under  projection.  Since  the  axis  a  is  planar,  the  points  on  the  imaged  axis 
(T  are,  at  worst,  a  plane  projective  transformation  of  points  on  the  axis.  Provided 
the  axis  is  a  smooth  curve,  then  its  image  is  also  smooth  since  a  plane  projective 
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Figure  5:  N  and  N'  are  normals  to  the  profile  curves  at  p  and  p'  respectively.  V  is 
the  unit  vector  orthogonal  to  p  —  p'.  A  circle  bitangent  to  the  curves  at  those  points 
has  centre  c  at  the  intersection  o/N  and  N^,  with  r  =  r'  and  9  =  9' .  pi  and  p2  illus¬ 
trate  corresponding  and  non- corresponding  points  respectively.  One  possible  measure 
of  correspondence  is  |r-r'|;  but  this  is  insensitive  and  requires  normalization,  since 
it  varies  with  ||p  —  p^||-  The  implemented  measure  is  |sin^  —  sin^'|. 

transformation  does  not  introduce  cusps.  Consequently,  the  symmetry  set  of  the 
profile  will  be  smooth,  even  if  the  profile  contains  cusps.  (It  is  assumed  that  the 
symmetry  set  is  determined  from  correctly  corresponding  points  of  the  profile  even 
when  the  profile  self-intersects,  for  example,  in  a  swallowtail.) 

For  an  opaque  object,  parts  of  the  contour  generator  are  occluded,  and  con¬ 
sequently  parts  of  the  profile  axe  ‘missing’  (compared  to  the  profile  of  a  semi¬ 
transparent  object).  Again,  provided  the  symmetry  set  is  determined  for  correctly 
corresponding  points  on  the  profile,  the  symmetry  set  will  be  smooth,  but  some 
parts  of  the  axis  may  not  appear  in  the  image  (i.e.  the  symmetry  set  may  contain 
gaps  corresponding  to  the  ‘missing’  parts  of  the  profile). 

For  a  surface  of  revolution  self-occlusion  occurs  simultaneously  (i.e.  at  the  same 
circular  cross  section)  on  both  sides  of  the  profile.  This  is  not  the  case,  in  general, 
for  a  canal  surface. 

3  Extracting  the  Symmetry  Set 

3.1  Initial  Processing 

Edges  are  extracted  to  sub-pixel  accuracy  using  a  local  implementation  of  the  Canny 
edge  operator  [8].  These  are  linked  into  edgel-chains  by  a  sequential  linker  which 
extrapolates  over  small  gaps.  Accurate  curve  normals  are  computed  at  each  point 
of  the  edgel-chain  by  locally  fitting  a  quadratic  using  least  squares  regression  with 
central  Gaussian  weighting  on  a  thirteen  point  neighbourhood. 
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Figure  6:  Upper:  images  from  substantially  different  viewpoints  of  the  same  canal 
surface  (pipe  1).  Lower:  the  profiles  and  extracted  symmetry  sets.  Note  the  radically 
different  shapes  of  the  profiles  and  symmetry  sets.  However,  in  a  canonical  frame 
the  three  symmetry  sets  are  virtually  identical — see  figure  7c. 


3.2  Canal  Surfaces 

3.2.1  Bitangent  circles 

The  procedure  for  generating  the  symmetry  set  involves  working  along  two  curves, 
7  and  7',  simultaneously.  The  aim  is  to  fit  a  circle  which  is  tangent  simultaneously 
to  points  p  and  p'  on  each  curve.  If  such  a  circle  exists  then  its  centre  lies  on  the 
intersection  of  the  normals  at  p  and  p' ,  and  is  equally  distant  from  p  and  p'.  To 
determine  such  points,  a  ‘correspondence  measure’  is  computed  as 

=  |V-N- V-N'l  =  |sin0-sin^'|; 


see  figure  5. 

For  a  given  pair  of  points,  the  nearer  to  zero  the  measure,  the  closer  those  points 
are  to  corresponding.  A  point  on  the  symmetry  set  is  generated  by  selecting  a 
point,  p,  on  one  curve,  and  then  determining  the  point  on  the  other  curve,  p',  which 
matches  most  closely  according  to  the  value  of  M{p,p').  The  symmetry  set  point 
is  calculated  analytically  by  intersecting  the  normals  of  the  least  squares  quadratics 
associated  with  points  p  and  p'. 

The  construction  of  the  whole  symmetry  set  proceeds  by  selecting  successive 
points  of  both  curves  and  generating  a  symmetry  set  point  from  each  as  described 
above.  Checks  are  included  to  ensure  that  symmetry  set  points  are  properly  ordered, 
and  that  no  pairing  of  profile  points  generates  more  than  one  symmetry  set  point. 

Examples  of  extracted  symmetry  sets  for  canal  surfaces  are  shown  in  figure  6. 
The  pipes  used  here,  and  in  subsequent  images,  have  piecewise  constant  radius  rather 
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than  constant  radius.  These  belong  to  a  larger  class  of  surface,  but  provided  the 
radius  is  constant  in  the  neighbourhood  of  axis  inflections  (so  that  vanishing  lines 
can  be  determined)  the  theoretical  results  of  section  2.3.1  are  still  valid.  However, 
extracting  a  symmetry  set  is  more  complicated  in  this  case,  since  the  process  must 
cope  with  abrupt  changes  in  radius. 


3.3  Surfaces  of  Revolution 

The  extra  constraints  available  in  this  case  are  used  to  simplify  and  improve  the 
processing.  The  imaged  axis  is  computed  directly  from  the  profile,  without  fitting 
circles.  The  scaling  function  is  obtained  from  radii  of  bitangent  circles  with  centres 
constrained  to  the  axis. 

3.3.1  Calculation  of  the  axis 

Under  weak  perspective  the  two  sides  of  the  profile  are  related  by  an  affine  trans¬ 
formation  with  three  degrees  of  freedom  [17].  These  represent  the  skewed  symmetry 
line  (2  DOF)  and  the  correspondence  direction  (1  DOF)^. 

The  affine  transformation  x'  =  Ax-fb  which  relates  corresponding  points  x'  and  x 
on  the  two  profile  sides  is  determined  by:  first,  obtaining  an  approximate  solution  by 
determining  point  correspondences  from  bitangent  contact  points;  second,  numerical 
minimization  of  the  squared  distances  between  one  side  of  the  profile  and  the  other 
transformed  by  {A,b},  measured  at  a  number  of  points  along  the  profile  length. 
Ten  points  give  an  excellent  match  for  the  entire  profile.  This  determines  point 
correspondences  between  the  two  sides  of  the  profile.  It  can  be  shown  [17]  that  the 
symmetry  axis  is  given  by 


2{ZyX  ^cix^  Oiybx  "F  — •  0 


where  a  and  b  (b  as  above)  are  the  eigenvectors  of  A  (and  have  eigenvalues  +1  and 
— 1  respectively). 

3.3.2  Calculation  of  the  scaling  function 

For  each  point,  p,  on  one  side  of  the  profile,  the  corresponding  point,  p',  on  the  other 
side  is  determined  using  the  affine  transformation  {A,  b}.  Circles,  with  centres  con¬ 
strained  to  the  symmetry  axis,  are  fitted  to  the  thirteen  point  profile  neighbourhood 
of  p  and  p^,  using  a  modified  version  of  Pratt’s  circle  fitting  algorithm  [22].  Exam¬ 
ple  circles  are  shown  in  figure  4.  A  method  for  measuring  the  scaling  function  is 
described  in  section  4.1.2. 

the  image  aspect  ratio  is  correct,  then  under  weak  perspective  the  profile  sides  are  related 
by  a  mirror  symmetry  [19].  The  restricted  affine  transformation  is  used  (only  one  more  parameter 
in  this  case  than  a  mirror  symmetry)  because  the  process  is  then  tolerant  to  perspective  effects  as 
described  in  section  2.3.2. 
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a 


b 


c 


Figure  7:  The  transformation  of  a  symmetry  set  into  an  affine  canonical  frame,  (a) 
Profile  curves  and  symmetry  set  obtained  from  a  weak  perspective  image,  (b)  Dis¬ 
tinguished  lines  and  their  intersections,  the  three  distinguished  points.  The  transfor¬ 
mation  is  found  which  maps  the  distinguished  points,  {Di,D2,D3},  onto  the  basis 
points,  {61,82,63}.  (c)  The  result  of  applying  this  transformation  to  the  whole 
symmetry  set.  The  basis  points  are  shown  as  crosses.  There  are  six  symmetry  sets 
superimposed  here.  These  are  extracted  from  weak  perspective  images  with  varying 
viewpoint  of  pipe  1  (the  three  from  figure  6  and  three  similar).  They  are  virtually 
identical,  demonstrating  the  stability  of  the  affine  frame. 


4  Viewpoint-Invariant  Representation:  The  Canon¬ 
ical  Frame  Construction 

A  canonical  frame  is  a  method  of  affine  [14]  or  projective  [23]  curve  normalization. 

The  normalization  is  achieved  in  two  stages.  First,  a  number  of  distinguished  points 
(or  lines)  are  selected  on  the  curve.  Distinguished  points  are  ones  which  can  be  lo¬ 
cated  before  and  after  the  transformation,  such  as  corners  (tangent  discontinuities), 
inflections  (zeros  of  curvature),  and  bitangent  contact  points.  Second,  the  canoni¬ 
cal  frame  is  defined  by  selecting  positions  for  a  number  of  basis  points  or  lines,  for 
instance  the  three  vertices  of  an  equilateral  triangle  in  the  affine  case.  The  curve  is 
then  transformed  such  that  the  distinguished  points  map  to  the  basis  points.  All 
curves  which  are  equivalent  up  to  an  affine  transformation  map  to  the  same  curve. 

In  the  projective  case,  four  points  or  lines  are  required. 

4.1  Affine  Frame 

4.1.1  Canal  surfaces 

The  distinguished  features  used  to  define  the  canonical  frame  are  the  three  straight 
segments  of  the  symmetry  set  (‘extended  inflections’).  The  symmetry  sets  for  the 
objects  in  the  model-base  have  at  most  four  such  lines:  for  any  given  pipe,  the 
same  three  must  be  selected.  Ordering  is  provided  simply  by  the  symmetry  set. 

The  intersection  of  these  three  lines  generates  three  distinguished  points,  which  are 
mapped  to  vertices  of  an  isosceles  triangle  in  the  canonical  frame,  as  illustrated  in 
figures  7a-c.  The  basis  points  chosen  do  not  differ  much  from  the  axis  curves  of  our 
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Figure  8:  The  affine  basis  for  a  surface  of  revolution  is  defined  on  the  image  axis 
by  two  points,  Si  and  S2,  which  are  the  centres  of  circles  tangent  at  the  bitangent 
contact  points.  These  basis  points  are  the  projection  of  the  centres  of  spheres  tangent 
to  the  surface  at  the  circle  of  contact  of  the  bitangent  cone.  Affine  coordinates  on 
the  axis  defined  with  respect  to  these  points  are  viewpoint-invariant. 


example  pipes;  this  avoids  disproportionate  enlargement  or  contraction  of  sections 
of  any  symmetry  set  in  the  canonical  frame.  For  weak  perspective  images,  stability 
over  viewpoint  is  excellent;  see  figure  7c. 

4.1.2  Surfaces  of  revolution 

A  canonical  frame  in  this  case  simply  requires  an  affine  parametrization  for  the  axis, 
and  a  normalization  for  the  scaling  function.  The  following  construction  requires 
only  one  bitangent  on  the  profile.  An  affine  parameter  on  the  axis  requires  two  basis 
points;  these  are  the  centres  of  the  circles  bitangent  at  the  bitangent  line  contact 
points.  The  normalization  is  determined  by  setting  the  sum  of  the  circle  radii  to  a 
constant  value.  See  figure  8.  Stability  over  viewpoint  and  discrimination  between 
objects  are  demonstrated  in  figures  9  and  10  respectively. 

The  utility  of  the  scaling  function  as  a  representation  for  a  general  surface  of 
revolution  is  clearly  limited:  first,  because  self-occlusion  progressively  erases  the 
function;  second,  because  the  representation  is  biased  against  ‘spherical  surfaces’ — 
as  the  surface  of  revolution  approaches  a  sphere,  the  symmetry  set  reduces  to  a 
point,  and  the  scaling  function  to  a  single  value. 


4.2  Projective  Frame 

In  general,  an  axis  curve  can  only  be  recovered  up  to  a  projective  transformation 
from  a  perspective  image.  However,  for  canal  surfaces,  the  vanishing  line  of  the  axis 
curve  plane  can  be  determined  from  profile  inflection  tangents  by  the  construction 
of  figure  3.  If  the  vanishing  line  is  sent  to  infinity,  the  canonical  frame  curve  is  again 
within  an  affine  transformation  of  the  axis,  i.e.  affine  properties  can  be  measured 
from  the  perspective  image!  The  necessary  projective  transformation  is  computed 
from  the  following  two  requirements:  first,  the  vanishing  line  maps  to  infinity,  (i.e. 
the  third  homogeneous  line  coordinate  is  zero);  second,  the  three  distinguished  points 
are  mapped  to  the  affine  canonical  frame  basis  points. 
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Figure  9:  Top  row:  weak  perspective  images  of  the  same  vase  taken  from  different 
viewpoints.  Self- occlusion  increases  left  to  right.  Second  row:  extracted  profiles. 
Third  row:  scaling  plotted  against  an  affine  axis  parameter.  Cusps  in  the  graph  are 
generated  at  points  where  the  profile  evolute  [6]  crosses  the  symmetry  axis.  The 
comparison  of  the  three  canonical  curves  demonstrates  stability  over  viewpoint.  Dif¬ 
ferences  arise  from  self-occlusion  which  progressively  erodes  the  function  from  the 
left. 
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Figure  10:  Graphs  of  scaling  against  an  affine  axis  parameter.  The  representation 
clearly  discriminates  between  the  two  vases. 


a  b  c 

Figure  11:  (a)  Affine  canonical  frame  for  symmetry  sets  from  six  weak  perspective 
images  (as  in  figure  7c)  and  seven  images  with  significant  perspective  effects.  Note 
the  distortion  for  the  latter  curves,  (b)  Projective  canonical  frame  for  the  thirteen 
symmetry  sets  for  the  same  pipe  used  in  (a).  The  curves  are  virtually  identical, 
eliminating  the  perspective  distortion  shown  in  the  affine  frame,  (c)  Superimposed 
symmetry  sets  of  pipes  1  and  4  (pip^  4  shown  injigure  12)  in  a  projective  canonical 
frame.  In  each  case  there  are  twelve  curves.  This  demonstrates  discrimination 
between  objects  from  their  canonical  frame  curves.  The  difference  between  the  two 
curve  sets  is  measured  using  a  statistical  classifier. 
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Figure  12:  Example  images  and  projective  canonical  frames  for  three  canal  sur¬ 
faces,  pipes  2,  3  and  4-  Note  the  wide  variation  in  viewing  position.  From  top  to 
bottom,  the  canonical  frames  contain  symmetry  sets  generated  from  six,  five  and 
twelve  images  respectively.  At  least  half  of  each  set  show  significant  perspective 
distortions — note  the  variation  in  pipe  width  in  the  middle  column  due  to  perspec¬ 
tive.  The  end-diameter  of  pipe  2  is  15mm — the  same  diameter  as  pipe  1;  pipes  3 
and  4  both  have  an  end-diameter  of  22mm.  The  slight  instability  present  towards 
the  ends  of  the  canonical  frame  curves  are  due  to  errors  in  the  extracted  symmetry 
set  which  occur  where  the  pipe  radius  changes. 


Figure  13:  The  construction  used  to  obtain  the  invariant  line  lengths  L  in  the  canoni¬ 
cal  frame.  Nine  rays  are  cast  from  a  point  mid-way  between  two  of  the  distinguished 
points.  The  value  recorded  is  the  length  of  the  ray.  The  rays  have  equal  angular 
separation. 


Invariant  1 

Invariant  2 

Invariant  3 

mean  s.d. 

mean  s.d. 

mean  s.d. 

pipe  1  (13) 
pipe  2  (6) 
pipe  3  (5) 
pipe  4  (12) 

-21.988  0.1800 

-21.459  0.0998 
-23.981  0.1383 

-24.442  0.1879 

130.513  0.1671 
129.530  0.1514 
130.336  0.1564 
130.042  0.1773 

-22.531  0.1417 

-22.410  0.1938 
-22.089  0.1631 

-22.592  0.1781 

Table  1:  Invariants  computed  using  a  Fisher  linear  discriminant  based  on  the  canon¬ 
ical  frame  measures  shown  in  figure  13.  The  bracketted  numbers  after  each  pipe  give 
the  number  of  images  contributed  for  that  pipe  to  the  computation  of  the  Fisher 
Discriminant  matrix. 


Stability  is  poor  if  an  affine  frame  (three  symmetry  set  lines)  is  used  for  perspec¬ 
tive  images;  see  figure  11a.  However,  if  a  projective  frame  (three  symmetry  set  lines 
and  the  vanishing  line)  is  used,  stability  is  excellent;  see  figure  11b.  This  stability 
of  canonical  frame  curves  is  an  indirect  indication  of  the  planarity  of  the  axis  curve. 
Examples  of  projective  frames  for  other  pipes  and  discrimination  between  them  are 
given  in  figures  11c  and  12. 


5  Viewpoint- Invariant  Measurements:  Invariants 

The  previous  section  described  the  computation  of  viewpoint-invariant  curves.  Here, 
we  compute  invariants  from  the  curves.  An  invariant  is  a  number  whose  value  is 
unaffected  by  viewpoint. 

Invariants  are  obtained  from  measurements  on  the  canonical  frame  curve  illus¬ 
trated  in  figure  13.  This  construction  [24]  is  similar  to  the  footprints  of  Lamdan,  et 
o/.  [14],  although  here  lengths  are  measured  rather  than  areas.  The  vector  of  invari- 
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Figure  14:  Scatter  plot  of  the  first  two  invariants  computed  from  the  pipe  images. 
Pipes  1-4  are  represented  by  triangles,  squares,  circles  and  stars  respectively.  This 
demonstrates  the  sensitivity  of  the  discriminant — the  canonical  frame  curves  for 
pipes  1  &  2  are  almost  identical. 

ant  line  lengths  L  is  not  used  directly  for  discrimination.  Instead,  an  index  vector  M 
is  constructed  from  L  using  a  statistical  classifier  over  all  extracted  canonical  frame 
curves.  There  are  two  advantages  of  this:  first,  the  index  is  more  discriminating 
than  ‘raw’  lengths;  second,  the  dimension  of  the  index  will  generally  be  less  than 
the  dimension  of  L,  simplifying  its  use  in  a  recognition  system. 

The  Fisher  linear  discriminant  [10],  which  is  an  optimal  linear  classifier,  is  used 
for  the  computation  of  the  index.  The  discriminant  minimises  the  ‘intra-class  vari¬ 
ance’  (that  is  over  several  examples  of  the  same  pipe)  and  maximises  the  ‘inter-class 
separation’  (that  is  for  different  pipes).  It  does  so  by  transforming  to  a  new,  orthog¬ 
onal  basis,  M  =  E  L,  where  the  matrix  E  is  computed  from  feature  measurements 
taken  over  all  the  canonical  frames.  The  invariants  are  the  components  of  the  vector 
M. 

Values  of  these  invariants  and  their  variances  for  the  four  pipes  of  figures  6  and  12 
are  given  in  table  1.  The  scatter  plot  of  figure  14  demonstrates  that  the  four  pipes 
could  almost  be  distinguished  solely  using  the  first  two  invariants.  In  practice  three 
invariants  reliably  (up  to  two  standard  deviations)  distinguish  all  the  examples  of 
figures  lib  and  12. 


6  Recognition 

Object  representation  up  to  a  linear  transformation  is  sufficient  for  recognition. 
Indeed,  a  model-based  recognition  system  has  been  built  for  planar  objects  using 
only  projective  properties  [24].  The  same  system  architecture  can  be  used  to  build 
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Figure  15:  Viewpoint-invariant  recognition,  (a)  The  image  contains  one  pipe  in 
the  model  library  (the  large  pipe  on  the  left),  and  two  others  that  are  not  in  the 
library,  as  well  as  other  objects,  (b)  The  pipe  in  the  library  is  correctly  recognised 
and  identified  as  pipe  2.  The  black  curve  shows  the  projected  model  symmetry  set. 
The  other  pipes  are  not  recognised.  All  processing  is  automatic.  Note  the  significant 
perspective  distortion  of  the  left  pipe. 


a  recognition  system  based  on  the  GC  invariants. 

There  are  two  stages  in  building  a  recognition  system:  first,  acquisition,  where 
canonical  frame  curves,  and  their  invariants,  are  stored  in  a  model  library;  second, 
recognition,  where  the  system  identifies  which  model  (if  any)  is  in  a  perspective 
image. 

Recognition  proceeds  by:  (1)  curves  are  selected  as  potentially  belonging  to 
pipes  by  locating  consistent  sets  of  straight  line  pairs,  other  curves  are  discarded; 
(2)  computing  symmetry  sets  for  all  selected  curve  pairs;  (3)  computing  invariants 
for  each  symmetry  set  curve;  (4)  using  the  invariants  as  an  index  to  access  the  model 
library — if  an  invariant  value  corresponds  to  one  of  the  library  values,  a  recognition 
hypothesis  is  generated;  (5)  verifying  the  recognition  hypothesis  by  comparing  the 
target  syrmnetry  set  to  a  library  curve. 

A  recognition  system  has  been  built  which  can  identify  a  pipe  from  a  perspective 
image.  The  image  can  contain  several  objects  from  the  library  as  well  as  other 
unmodelled  objects  (‘clutter’),  and  the  viewpoint  is  unconstrained.  At  present  the 
model  library  contains  four  pipes.  Recognition  examples  are  shown  in  figures  15 
and  16. 


7  Discussion 

We  have  demonstrated  a  viewpoint-invariant  representation  for  GCs  of  a  certain 
class  S.  The  representation  is  stable  over  viewpoint,  discriminates  between  objects, 
and  can  be  reliably  extracted  from  images.  Invariants  based  on  the  representation 
have  been  successfully  used  as  index  functions  in  a  model-based  recognition  system. 
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Figure  16:  (a)  The  image  contains  two  pipes  in  the  model  library  as  well  as  other 
objects;  pipe  1  on  the  left,  pipe  4  on  the  right,  (b)  Both  pipes  are  correctly  recognised. 


Although  theoretically  correct  only  for  precise  envelopes  of  spheres,  the  methods 
are  not  too  sensitive  to  deviations  from  the  ideal.  For  example,  the  cross-section 
of  pipe  1  is  actually  elliptical,  with  an  aspect  ratio  varying  between  0.89  and  1.0, 
rather  than  uniformly  circular.  Clearly,  because  of  self-occlusion,  the  representation 
is  less  useful  for  surfaces  of  revolution. 

We  are  currently  enlarging  the  model  library,  and  investigating  invariants  which 
are  less  sensitive  to  missing  contours.  Typically  contour  segments  are  missing  due 
to:  feature  ‘drop  out’,  occlusion  from  other  objects,  ajid  self-occlusion.  There  axe  a 
number  of  alternative  methods  for  extracting  affine  invariants  from  the  remaining 
symmetry  set  curve  portions.  In  particular,  semi-differential  curve  invariants  [29]  do 
not  require  such  a  rich  curve  geometry  as  that  required  for  a  canonical  frame,  since 
fewer  inflections  are  needed. 
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Abstract 

We  describe  a  model  based  recognition  system,  called  LEWIS,  for  the  identification  of  planar  objects  based 
on  a  projectively  invariant  representation  of  shape.  The  advantages  of  this  shape  description  include  simple 
model  acquisition  (direct  from  images),  no  need  for  camera  calibration  or  object  pose  computation,  and  the 
use  of  index  functions.  We  describe  the  feature  construction  and  recognition  algorithms  in  detail  and  provide 
an  analysis  of  the  combinatorial  advantages  of  using  index  functions.  Index  functions  are  used  to  select 
models  from  a  model  base  and  are  constructed  from  projective  invariants  based  on  algebraic  curves  and  a 
canonical  projective  coordinate  frame.  Examples  are  given  of  object  recognition  from  images  of  real  scenes, 
with  extensive  object  libraries.  Successful  recognition  is  demonstrated  despite  partial  occlusion  by  unmodelled 
objects,  and  realistic  lighting  conditions. 

1  Introduction 

1.1  Overview 

In  the  context  of  this  paper ,  recognition  is  defined  as  the  problem  of  assigning  the  correct  label  to  an  object 
seen  in  a  perspective  view.  Recognition  is  considered  successful  if  the  2D  geometric  configuration  of  an 
object  in  an  image  can  be  explained  as  a  perspective  projection  of  a  geometric  model  of  the  object.  In  this 
paper  we  restrict  ourselves  to  planar  objects,  although  many  man-made  3D  objects  can  be  decomposed  into 
recognisable  planar  patches. 

A  key  aspect  of  the  system  is  the  use  of  the  projective  transformation  group  to  represent  perspective 
image  projections.  Most  object  recognition  systems  use  approximations  to  perspective,  such  as  affine  or 
orthographic  camera  models.  Such  approximations  are  often  valid,  but  viewing  conditions  where  depth 
variation  of  the  object  is  significant  compared  to  viewing  distance,  or  those  that  consider  a  wide  viewing 


angle  require  a  complete  representation  of  the  effects  of  perspective  image  formation. 

Perspective,  o,  central  projection,  does  not  exhibit  the  full  rr^ge  of  geonuetric  transformation  possible 
unde,  the  projective  model.  For  example,  convexity  is  preserved  under  perspective  projection  (so  long  as  the 
imaged  object  does  not  intersect  the  image  plane),  but  not  under  full  projective  transformation.  However, 
the  convenience  of  homogeneous  coordinates,  the  consequent  linearity  of  projective  transformations,  and  the 
associated  group  properties  motivate  out  use  of  projective  geometry  throughout.  The  reidtictions  associated 
with  perspective  are  introduced  in  the  recognition  process  as  part  of  hypothesis  conficmation.  It  should 
also  be  noted  that  the  parameters  associated  with  internal  camera  calibration  ate  implicit  in  projective 
projection,  so  object  description  and  recognition  is  not  dependent  on  camera  geometry. 

The  recognition  system,  which  U  called  LEWIS  (Library  Entry  Working  through  an  Indexing  Sequence), 
is  designed  around  the  use  of  invariant  indexing  functions  to  represent  each  object  class.  An  invariant  is 
defined  as  a  function  which  measures  some  geometric  properties  of  an  object  but  whose  value  is  mdepe 
of  projective  frame.  These  indexing  functions  are  computed  from  the  geometric  coordinates  or  coefficients 
of  a  small  group  of  image  feature,  such  «i  points,  lines  mid  conics.  The  emphasis  is  on  efficiently  indexing 
a  Imge  model  library,  where  the  index  keys  are  constructed  from  invariant  function  values.  In  practice, 
the  index  derived  from  one  view  (a  model  acquisition  view)  can  be  11^  to  access  the  object  model  in  any 


subsequent  view . 

Since  these  indexes  can  be  derived  from  mw  viewpoint,  it  follow,  thm  any  unoccluded  view  of  the  object 
can  serve  as  a  model.  We  derive  the  invariant  values  for  the  library  from  a  typical  view  and  also  include 
information  which  is  needed  for  verification,  such  as  the  main  geometric  features  of  the  object  and  the 
bounding  box  of  the  features.  It  is  beneficial  to  miquir.  tbe  model  directly  from  an  image  since  the  resulting 
geometry  reflects  the  actual  shape  of  the  object,  including  rounded  comer,  and  other  manufiwturing  mtifacts. 

Invmiant  indexing  functions  are  derived  micording  to  two  approaches.  In  the  first  approach,  slgehrsic 
iuuariuuls  are  biwed  on  classical  results  derived  from  the  projective  geometry  of  algebraic  curvm  (Semple52). 
Tbe  fundamental  invariant  in  projective  geometry  is  the  cross  ratio,  which  is  defined  for  four  collinear  points 
in  terms  of  ratios  of  distances  between  the  points.  A  similar  invariant  can  be  defined  for  four  hue.  concurrent 
at  a  single  point.  More  general  algebraic  invmlants  can  be  derived  from  configurations  of  conics,  points  and 
lines.  For  example,  a  cross  ratio  can  be  generated  from  two  points  and  a  conic;  this  arisw,  because  the  line 
passing  through  the  two  points  intersects  the  conic  in  two  other  collinear  points.  These  and  other  algebraic 


invariants  will  be  discussed  in  detail  in  section  2.2. 

Tbe  second  approach  to  the  construction  of  invariant  indexing  functions  is  the  use  of  projective  coordi¬ 
nate  frames.  In  the  projective  plane,  four  points,  no  three  of  which  are  collinear,  define  unique  projective 
coordinates  for  any  other  point  in  the  plane.  These  projective  coordinates  are  invariant  to  any  projective 
trmisformation  of  the  plane.  We  can  define  a  particular  frame,  usually  a  frontmp.rallel  view,  which  we  call 
the  canonicul  /mme.  Invarimit  indexes  are  constructed  from  a  smnple  of  points  on  the  boundary  of  the  object 
when  projected  onto  the  canonical  frame.  The  advantage  of  the  canonical  frame  construction  is  that  the 
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object  boundary  does  not  have  to  be  an  algebraic  curve. 

These  ideas  have  been  incorporated  into  a  complete  recognition  system  over  the  past  four  years.  The 
system,  LEWIS,  has  been  tested  on  a  large  set  of  images  and  under  varying  levels  of  occlusion  and  clutter. 
The  major  issues  which  have  been  examined  in  the  evaluation  of  LEWIS  are. 

1.  The  dependence  of  recognition  complexity  on  the  number  of  models  in  the  database. 

2.  The  discrimination  power  of  projective  invariant  descriptions,  particularly  in  the  presence  of  clutter 
and  occlusion. 

3.  The  effect  of  illumination,  object  surface  properties  and  feature  segmentation  on  invariant  values. 

4.  The  practicality  of  constructing  object  models  directly  from  an  example  object  view. 

We  will  explore  these  issues  in  later  sections,  but  first  it  will  prove  useful  to  establish  the  framework  for 
object  recognition.  In  particular,  we  establish  the  benefits  of  a  model  library  accessed  by  invariant  keys. 

1.2  Recognition  Framework 

Recognition  consists  of  two  process:  the  first  is  the  identification  of  which  object  is  potentially  present  in  the 
scene;  and  the  second  is  the  establishment  of  a  correspondence  between  the  image  and  the  identified  model 
features.  Often,  these  processes  are  not  distinct,  though  together  they  can  be  partitioned  into  three  stages 
that  should  be  contained  within  any  recognition  system  (these  are  similar  to  those  defined  in  [Grimson90], 
p.33): 

Grouping;  what  subset  of  the  data  belongs  to  a  single  object? 

Indexing:  which  object  model  projects  to  this  data  subset? 

Verification:  how  much  image  support  is  there  for  this  correspondence? 

Naturally,  these  stages  represent  an  idealised  decomposition;  robust  recognition  generally  requires  numerous 
interactions  between  the  stages.  However,  this  structure  yields  a  productive  framework  for  defining  and 
measuring  the  general  characteristics  of  recognition  systems. 

The  aim  of  grouping  (also  called  perceptual  organisation  [Lowe85],  selection,  or  figure-ground  discrimina¬ 
tion)  is  to  provide  an  association  of  features  that  are  likely  to  have  come  from  a  single  object  in  a  scene.  Image 
features  which  are  exploited  in  grouping  cover  all  levels  of  image  segmentation,  for  instance:  edgels;  corners; 
algebraic  features  such  as  lines  and  conics;  smooth  curves  represented  as  splines;  and  feature  descriptions 
based  on  regions,  such  as  texture.  These  features  are  typically  grouped  together  using  cues  such  as  proximity, 
parallelism  [BinfordSl,  Lowe85]  collinearity  and  approximate  continuity  in  curvature  [Cox92,  Shaashua88]. 
In  the  work  reported  here,  we  exploit  many  of  these  techniques  to  generate  feature  groups  from  which 
invariant  index  functions  are  constructed. 
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Indexing  addresses  the  problem  of  model  hypothesis  generation.  For  a  small  number  of  models,  for 
example  two  or  three,  it  is  reasonable  to  try  simply  to  find  image  feature  support  for  each  model.  This 
approach  is  typical  of  many  existing  systems  [AyacheSG,  Ayache87,  Grimson90,  Huttenlocher88,  Lowe87, 
Murray87,  Pollard89].  As  the  size  of  the  model  library  increases,  this  approach  becomes  computationally 
too  expensive.  It  is  then  more  effective  to  choose  potential  models  from  the  library  based  on  the  observed 
image  features.  That  is,  image  feature  measurements  are  used  to  index  into  the  model  base.  The  work 
presented  in  this  paper  demonstrates  that  efficient  indexing  strategies  can  be  constructed,  and  that  through 
using  them,  dramatic  improvements  in  hypothesis  generation  efficiency  can  be  achieved. 

The  final  stage  is  verification.  Grouping  and  indexing  have  hypothesised  a  match  between  an  object  and  a 
small  number  of  image  features.  This  match  is  used  to  project  the  model  onto  the  image.  The  validity  of  the 
model  hypothesis  and  model-to-image  feature  correspondences  is  determined  by  searching  for  image  features 
that  have  not  been  used  in  the  construction  of  indexes.  These  are  features  that,  for  instance,  have  been 
missed  by  the  grouping  stage.  The  more  features  that  can  be  found  which  are  close  to  the  projected  model 
boundaries,  the  more  likely  it  is  that  the  initial  hypothesis  is  correct.  Once  all  possible  correspondences  have 
been  accepted  or  ruled  out,  a  conclusion  as  to  the  identity  of  an  object  can  be  made.  Generally,  a  hypothesis 
is  considered  successful  if  the  error  between  projected  model  features  and  corresponding  image  features  is 
below  some  threshold  and  a  reasonable  fraction  of  the  object  outline  is  covered  by  image  features. 

There  are  three  distinct  algorithms  that  have  been  used  to  compute  correspondence.  In  the  first  approach, 
interpretation  trees  [Ayache87,  Brooks83,  Ettinger88,  Fisher89,  Grimson87,  Murray87,  Poliard89,  Reid91], 
the  set  of  correspondences  is  grown  incrementally  according  to  a  branch  and  bound  search  algorithm.  Fea¬ 
tures  are  added  according  to  their  consistency  with  a  model  hypothesis  associated  with  each  node  in  the 
graph.  Consistency  is  also  a  function  of  the  specific  set  of  features  defined  by  the  path  from  the  root  of  the 
search  tree  to  the  current  node. 

The  second  approach,  hypothesise  and  test  [Ayache86,  Bolles87,  Goad83,  Huttenlocher87 ,  Lowe87],  gen¬ 
erates  model  hypotheses  exhaustively  from  the  library,  although  properties  of  small  feature  groups  can  be 
used  to  suggest  initial  trial  feature  correspondences.  These  hypotheses  are  tested  by  establishing  model- 
dependent  and  priority  ordered  checklist  of  other  features  which  must  be  present  to  satisfy  the  hypothesis. 
A  set  of  focus  features  are  defined  for  each  object  which  are  easily  extracted  and  also  provide  maximum 
discrimination  among  object  classes. 

The  third  approach,  pose  clustering  [Cass92,  Stockman87,  Thompson87],  uses  the  concept  of  pose  con¬ 
sistency  to  generate  hypotheses.  An  object  is  projected  onto  an  image  under  a  single  transformation  acting 
on  all  points  of  the  object.  The  image  projection  of  an  object  is  composed  of  a  3D  Euclidean  transformation 
(called  pose),  followed  by  a  perspective  mapping.  The  3D  pose  can  be  computed  from  various  model-to- 
image  feature  correspondences  and  should  be  the  same  for  all  correct  correspondences  from  a  single  object. 
The  search  for  correct  correspondences  is  then  the  problem  of  finding  clusters  in  pose  space. 
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The  recognition  system  reported  here  shares  many  characteristics  with  these  approaches,  particularly  in 
the  stages  of  feature  grouping  and  hypothesis  testing,  but  differs  considerably  in  how  model  hypotheses  are 
generated  and  feature  correspondences  are  established.  Our  approach  to  these  stages  centres  on  the  use  of 
index  functions  that  we  now  define  more  formally. 

1.3  Indexing  Functions 

The  concept  of  the  indexing  function  can  be  developed  formally  as  follows:  the  index  is  considered  to 
be  a  vector,  M,  which  selects  a  particular  model  from  the  library.  Each  model  consists  of  the  set  of 
significant  geometric  features  of  the  object  boundary  as  well  as  ancillary  information  required  for  hypothesis 
confirmation  such  as,  the  bounding  box  of  the  features,  pixel  chains  from  which  the  boundary  features  are 
constructed,  and  perhaps  texture  or  other  details  of  the  object  surface  properties. 

The  model  index  is  a  function  only  of  a  set  of  projected  model  features,  F ,  that  is  M  can  be  computed  from 
any  image  projection  of  the  model  features.  The  practical  consequence  is  that  models  can  be  constructed 
simply  by  acquiring  one  or  a  few  image  views  of  the  object  in  isolation.  If  Fmodei  is  the  set  of  features 
actually  on  an  object,  and  T  is  the  transformation  from  the  object  in  an  arbitrary  pose  onto  the  camera, 
then; 

M(T(F,„<,de;))  =  MiTmodel)- 

This  equation  states  that  the  index  function  is  (scalar)  invariant  [Forsyth91]  to  transformations  of  the 
object  which  result  from  different  viewpoints.  In  the  results  reported  here,  the  index  functions  are  invariant 
to  projective  transformations  of  the  image  plane.  Each  element  of  the  index  vector  M  is  an  invariant  measure 
computed  from  a  group  of  model  features  such  as  conics,  lines,  points  and  plane  curve  segments.  Ideally, 
the  index  function  should  uniquely  retrieve  a  model  from  the  library,  but  in  practice  it  is  likely  that  a  small 
number  of  models  are  retrieved  with  the  same  index.  Even  so,  the  search  cost  is  considerably  reduced  below 
that  of  testing  every  member  of  the  library. 

The  concept  of  an  indexing  function  described  above  assumes  that  both  the  indexes  for  the  model  and 
for  the  object  can  be  measured  perfectly  in  a  scene.  In  practice,  the  measurements  are  imprecise  due 
both  to  modeling  and  imaging  errors^ .  It  is  therefore  necessary  to  provide  a  range  of  invariant  values  in 
the  construction  of  the  index  function.  In  LEWIS,  the  range  is  established  by  quantising  the  index  space 
according  to  the  observed  variation  in  invariant  values  due  to  the  effects  just  mentioned.  The  quantised 
index  value  is  denoted  by  Q  and  a  quantisation  is  selected  so  that, 

Q(M(Fmo<Jei)  +  Emode/)  =  Q(M(Fjmo3e)  +  E,,„ose)- 

1  This  is  not  just  due  to  random  image  noise^hich  is  often  considered  to  be  the  sole  cause  of  error,  but  ako  due  to  events 
in  the  image  which  are  not  modeled  or  expected.  Examples  of  unmodeled  image  events  are:  specularities;  surface  texture;  and 
impinging  objects. 
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Note  that  the  quantisation  function  Q  is  the  same  for  both  the  model  and  the  image.  This  is  a  direct  result  of 
being  able  to  acquire  models  from  images.  The  error  characteristics  Himodei  and  Eimage  can  also  be  assumed 
to  be  the  same. 

Other  recognition  systems  have  also  exploited  index  functions  based  on  invariants.  A  system  using 
projective  invariants  is  described  by  Nielsen  for  identifying  and  tracking  mobile  robots  [Nielsen88].  Early 
versions  of  the  system  described  here  are  reported  by  Forsyth,  ei  at  [Forsyth91].  Indexing  functions  based 
on  affine  invariants  have  formed  the  basis  for  a  number  of  planar  object  recognition  systems,  for  instance  the 
series  of  papers  by  Kalvin  ei  aL  [Kalvin86,  Schwartz87,  Lamdan88],  Wayner  [Wayner91],  Clemens  and  Ja¬ 
cobs  [Clemens91],  Huttenlocher  [Huttenlocher91],  Taubin  and  Cooper  [Taubin91].  Other  avenues  of  invariant 
research  have  been  covered  by  Weiss  [Weiss88],  Stein  and  Medioni  [Stein92],  Califanoand  Mohan  [Califano92] , 
Gueziec  and  Ayache  [Gueziec93],  and  Rigoutsos  and  Hummel  [Rigoutsos91], 

It  is  also  possible  to  gain  some  of  the  advantages  of  indexing  without  using  index  functions  that  are 
strictly  invariant.  For  example,  Jacobs  describes  an  approach  to  indexing  3D  objects  using  one  parameter 
families  of  index  values  in  image  transform  space.  One  can  then  select  models  based  on  proximity  to  these 
index  sets  [Jacobs92].  Another  approach  is  the  use  of  quasi-invariants  [Binford93],  where  functions  are 
constructed  that  are  not  invariant  under  general  perspective  viewing,  but  are  reasonably  constant  over  most 
practical  viewing  conditions.  The  quasi-invariants  that  have  been  suggested  are  invariants  of  move  restricted 
transformation  groups  such  as  affine  and  equiform  (scaled  Euclidean).  Affine  transformations  apply  when 
the  depth  change  along  the  object  plane  is  small  compared  to  the  distance  from  the  center  of  perspective. 
The  equiform  case  occurs  when  the  object  plane  is  parallel  to  the  image  plane. 

1,4  Outline  of  the  Paper 

Section  2  introduces  the  notation  used  in  the  rest  of  the  paper,  defines  the  algebraic  and  canonical  frame 
invariants,  and  describes  the  segmentation  and  grouping  procedures  used  in  LEWIS.  Section  3  surveys  the 
recognition  architecture  with  results  and  statistics  given  for  the  systems  working  on  real  images.  Finally, 
section  4  highlights  weaknesses  in  the  current  approach  and  suggests  directions  for  future  research. 


2  Invariant  Indexing  Functions 

2.1  Notation 

When  homogeneous  coordinates  are  used  points  on  the  plane  are  represented  by  a  triple  x  =  (iCi,  — 

(Ax,  Ay,  A)^  where  (x,  y)^  are  the  standard  Euclidean  plane  coordinates  of  the  point  and  A  is  an  arbitrary 
(non-zero)  projective  scale  factor.  Points  in  the  projective  plane  are  equivalent  for  all  values  of  A.  Lines  are 
defined  by  1  =  (a,  6,  c)'^  =  (//  sin  0,  cos  0,  /zd)^,  where  0  is  the  orientation  of  the  line  with  respect  to  the  x 
axis  and  d  is  the  perpendicular  distance  of  the  line  from  the  origin.  ^  is  the  projective  scale  factor  for  lines. 
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The  incidence  of  a  point  and  a  line  in  the  projective  plane  is  given  by,  aa;i  +  bxi  +  cxz  =  0,  or  in  vector 
notation,  1  ♦  x  =  0. 

A  conic  is  the  set  of  points  1)^  that  satisfy: 


azf  +  6a:, y,-  4-  cyf  4-  dxi  4-  eyj  4-  /  =  0. 


(1) 


A  more  convenient  representation  of  a  conic  uses  a  planar  point  x  and  a  quadratic  form  C: 

x^Cx  =  0, 


(2) 


where: 


C  = 


^  c  ^ 


d 

2 


(3) 


From  now  on,  typewriter  font  denotes  matrices,  bold  letters  denote  vectors,  large  letters  denote  model  objects 
and  small  letters  denote  image  objects.  For  example,  C  is  a  model  conic,  X  a  model  point,  and  c  and  x  their 
images  in  a  view  where  recognition  is  to  be  achieved. 


2.1.1  Projective  Transformations 

A  projective  transformation  T  between  two  planes  is  represented  as  a  3  x  3  matrix  acting  on  homogeneous 
coordinates  of  the  plane.  It  is  a  linear  mapping  on  homogeneous  points.  A  homogeneous  representation 
means  that  only  ratios  of  matrix  elements  are  significant,  and  consequently  the  transformation  has  8  degrees  of 
freedom.  Under  imaging,  this  transformation  models  the  composed  effects  of  3D  rigid  rotation  and  translation 
of  the  world  plane  (camera  extrinsic  parameters),  perspective  projection  to  the  image  plane,  and  an  affine 
transformation  of  the  final  image  which  covers  the  effects  of  camera  intrinsic  parameters.  The  effects  of 
radial  distortion  due  to  the  camera  lens  are  not  modeled. 

All  of  the  parameters  of  these  separate  transformations  cannot  be  recovered  uniquely  from  a  single  3x3 
matrix,  since  there  are  6  unknown  pose  parameters,  and  5  unknown  internal  camera  parameters  (these  are 
camera  centre,  focal  length,  aspect  ratio  and  the  angle  between  the  coordinate  axes  of  the  image  plane).  For 
plane  to  plane  perspective  transformations,  there  are  therefore  11  unknowns  but  only  8  constraints.  Fortu¬ 
nately,  the  invariant  description,  and  model  projection  used  in  the  recognition  system,  do  not  require  explicit 
knowledge  of  either  the  pose  or  the  internal  camera  parameters.  We  need  solve  only  for  the  independent 
parameters  of  the  projective  transformation  T.  Note  that  projectivities  form  a  group,  and  so  most  notably 
every  action  has  an  inverse  and  the  composition  of  two  projectivities  is  also  a  projectivity.  Consequently, 
two  images  from  different  viewpoints  of  the  same  planar  object  are  always  related  by  a  projectivity. 

The  mapping  of  four  points  between  two  planes,  of  which  no  three  points  are  collinear,  is  sufficient  to 
determine  the  transformation  matrix  T,  Each  point  provides  two  linear  constraints  on  the  transformation 
parameters,  therefore  four  independent  points  provide  the  required  4x2  =  8  constraints.  Corresponding 
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points  (xi.yi)  and  {Xi,Yi)  are  represented  by  homogeneous  3  vectors  (a?,-, y,*,  1)^  and  The 

projective  transformation  x  =  TX,  (|T|  ^  0)  is: 


Til  Ti2  Ti3 

(xA 

Vi 

= 

T21  T22  T23 

Yi 

K  1  ) 

T31  T32  T33 

1  / 

where  ki  is  an  arbitrary  non-zero  scalar.  Note  that  in  using  this  formulation  we  are  unable  deal  with  plane 
points  lying  on  the  ideal  line;  this  is,  however,  unimportant  as  in  practice  all  of  the  points  to  be  transformed 
lie  within  the  finite  and  bounded  image  plane.  Yox  N  >  A  points,  singular  value  decomposition  can  be  used 
to  compute  T.  The  computation  can  be  formulated  as  minimising  ||At||  subject  to  ||t||  =  1,  where  t  is  the 
nine-element  vector  of  transform  parameters  and  A  is  a  iV  x  9  element  matrix  of  elements  formed  from  the 
coordinates  of  the  matched  image  and  model  points. 

Using  similar  algorithms,  projectivities  can  be  computed  between  sets  of  lines,  or  as  shown  in  [Rothwell94] 
for  different  combinations  of  points,  lines  and  conics.  The  projective  transformation  of  lines  is  closely  related 
to  that  for  points.  Given  the  transformation  matrix  for  points,  T,  lines  transform  according  to 

1  =  T’^L, 

where  is  the  inverse  transpose  of  T.  The  transformation  of  conics  is  as  follows:  given  C  and  its  respective 
image  conic  c,  and  point  transformation  matrix  T,  is  constrained  by: 

c  =  K  .T'^CT-^  (4) 


2.2  Algebraic  Invariants 

There  are  three  different  algebraic  invariants  used  within  the  recognition  system  for  coplanar  algebraic 
features:  five  lines;  a  pair  of  conics;  and  a  conic  and  two  lines.  Their  derivation  is  given,  for  example, 
in  [Mundy92].  There  are  many  other  possible  configurations  (with  points,  cubics,  etc.)  that  could  also  be 
used  to  generate  invariants.  The  particular  configurations  used  in  LEWIS  have  been  chosen  because  the 
constituent  geometric  features  can  be  produced  directly  and  accurately  from  segmentation.  In  contrast, 
points  are  extracted  most  accurately  indirectly  by  intersecting  lines. 


2.2.1  Five  Coplanar  Lines 


Given  five  coplanar  homogeneous  lines  1,-,  where  i  G  {1, ..,  5},  two  functionally  independent  projective  invari¬ 
ants  are  defined  using  determinants 


—  I^43i||M52i|  1  T  _  IM421IIH532I 

IM421IIM531I  IM432IIM521I 


(5) 


where  Hijk  =  (I,*,!;,!*). 
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Figure  1:  Examples  of  similarity,  affine  and  perspective  images  of  a  bracket.  For  each  view  the  lines  used 
to  compute  the  invariants  are  marked  in  white.  The  pair  of  five-line  invariants  are  computed  using  the 
determinant  formulae  given  in  this  section.  The  invariant  values  for  the  images,  and  those  actually  measured 
on  the  object  are  shown  in  table  1.  The  fact  that  they  remain  essentially  invariant  demonstrates  the  stability 
of  the  invariants  under  real  imaging  conditions.  For  reference,  the  values  of  affine  invariants  computed  from 
area  ratios  are  also  given  in  the  table. 


Table  1:  h  and  h  are  five-line  invariants  computed  for  the  similarity,  affine  and  perspective  views  of  the 
bracket  shown  in  figure  1.  hi  and  Ia2  are  affine  invariants  defined  by  the  ratio  of  areas  of  triangles  constructed 
from  the  points  of  intersection  of  the  lines.  The  values  of  h  and  h  are  consistent  with  those  measured  on 
the  object  and  vary  only  slightly  with  viewpoint  which  demonstrates  the  practicality  of  deriving  invariant 
measures  from  image  features.  Note  that  particularly  for  the  image  with  substantial  perspective  distortion, 
the  affine  invariants  /□!  an^  ''  ^ 


h 

h 

lal 

Ia2 

object 

0.840 

1.236 

0.739 

1.083 

similarity 

0.842 

1.234 

0.706 

1.051 

affine 

0.840 

1.232 

0.743 

1.066 

perspective 

0.843 

1.234 

0.623 

0.949 
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Table  2:  The  conic-pair  invariants  computed  for  the  similarity,  affine  and  perspective  views  of  the  computer 
tape  shown  in  figure  2.  Note  the  stability  of  the  measured  values  with  respect  to  change  in  viewpoint. 


h 

h 

object 

3.073 

3.082 

similarity 

3.074 

3.082 

affine 

3.072 

3.080 

perspective 

3.070 

3.078 

A  major  problem  with  the  determinant  formulae  given  in  equation  5  is  that  the  invariants  can  become 
undefined  for  certain  geometric  configurations.  The  determinant,  vanishes  when  the  lines  1,-,  Ij  and 

are  concurrent.  In  LEWIS,  grouping  is  used  to  eliminate  configurations  where  both  invariants  are  undefined 
so  that  one  of  the  values  of  h  and  h  can  always  be  used.  The  grouping  algorithm,  described  later,  ensures 
that  only  the  lines  x,-,  i  G  {1, 3, 5}  are  allowed  to  be  concurrent.  Since  there  is  no  determinant  of  M135  in  h, 
it  will  always  be  well  formed,  though  h  will  sometimes  fail. 

Examples  of  the  invariants  computed  for  real  image  distortions  are  demonstrated  in  figure  1,  and  the 
invariant  values  given  in  table  1.  The  fact  that  the  values  remain  constant  over  a  change  in  viewpoint 
demonstrates  the  stability  of  the  invariants  under  image  noise. 

2.2.2  Two  Coplanar  Conics 

A  pair  of  conics  C,- ,  2  G  {1, 2}  has  two  independent  projective  invariants.  These  can  be  expressed  in  terms  of 

ratios  of  eigenvalues  [Quan91],  or  equivalently 

,  Tracefcr^C2l.|Ci|V3  ,  ^  Trace[C^^Ci].\C2\^f^ 

- - ■ 

If  the  conics  are  normalised  so  that  |C,j  =  1  the  invariants  take  on  the  simpler  form  of: 

li  =  Trace[Ci^C2]  and  I2  =  TracelC^^Ci]. 

These  invariants  have  been  tested  extensively  during  the  development  of  the  system  reported  in  this  paper, 
and  have  been  found  to  have  good  noise  characteristics.  A  simple  example  showing  the  measured  invariants 
for  similarity,  affine  and  perspective  views  of  the  computer  tape  shown  in  figure  2  are  given  in  table  2.  The 
small  deviation  of  the  invariants  demonstrates  their  stability,  more  complete  results  are  given  in  [ForsythQl]. 


2.2.3  A  Conic  and  Two  Lines 

For  a  conic  C  and  two  lines  1,-, i  G  {1, 2},  a  single  invariant  can  be  computed: 

The  invariant  computed  for  the  similarity,  affine  and  perspective  image  sequence  of  the  bracket  is  shown 
in  figure  3.  The  corresponding  invariant  values  in  table  3.  Again  the  stability  of  the  invariant  form  is 
demonstrated  over  a  large  range  of  viewpoints. 
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Figure  2;  SimilarUy,  affine  and  perspeciive  images  of  a  computer  tape  with  the  conics  used  to  compute  the 
invariants  marked  in  white.  The  invariant  values  are  given  in  table  2. 


Table  3:  The  conic  and  line-pair  invariants  computed  for  the  similarity,  affine  and  perspective  views  of  the 
bracket  shown  in  figure  3.  Note  the  stability  of  the  measured  values  with  respect  to  change  in  viewpoint.  For 
comparison,  an  affine  invariant  is  also  tabulated.  In  this  case,  h  is  defined  by  the  ratio  of  the  areas  of  a 
triangle  and  of  the  conic  itself  One  vertex  of  the  triangle  is  located  at  the  intersection  point  of  the  two  lines; 
the  other  two  vertices  are  defined  by  the  points  of  tangency  to  the  conic  of  the  pair  of  lines  through  the  first 


I 

la 

object 

1.33 

0.398 

similarity 

1.33 

0.389 

afBne 

1.31 

0.403 

perspective 

1.28 

0.437 
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Figure  3:  Similarity ^  affine  and  perspective  images  of  a  bracket  with  the  conic  and  two  lines  used  to  compute 
ike  invariants  marked  in  white.  The  invariant  values  are  given  in  table  S. 

We  have  found  in  practice  that  the  conic  and  line  pair  invariant  is  not  stable  enough  alone  to  provide 
sufficient  discrimination  for  the  class  of  objects  used  in  our  experiments.  Three  independent  invariants  can 
be  formed  from  three  lines  and  a  conic,  using  the  lines  two  at  a  time.  The  combined  index  provides  better 
discrimination  as  explained  in  section  3.5.1. 

2,3  Canonical  Frame  Invariants 

A  canonical  frame  construction  can  be  used  to  form  an  invariant  signature  for  smooth  planar  curves.  The 
rest  of  this  section  describes  the  construction  of  the  signature  for  a  non- convex  class  of  plane  curves,  the 
work  is  a  projective  extension  of  that  of  Lamdan,  et  al.  [Lamdan88]. 

First,  we  illustrate  the  concept  of  a  canonical  frame  with  a  set  of  of  five  coplanar  points,  four  used  as 
a  projective  basis  and  the  fifth  to  generate  invariants.  We  then  show  how  four  distinguished  points  can  be 
defined  on  a  concavity  in  a  plane  curve.  The  rest  of  the  curve  can  then  considered  as  a  set  of  individual 
points  whose  coordinates  with  respect  to  the  projective  basis  define  the  signature. 
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Figure  4:  One  way  of  measuring  the  invariants  of  five  coplanar  points  in  a  image  (a)  is  to  compute  the 
projective  transformation  of  four  of  the  points  p,-,i  €  /o  reference  points  in  the  canonical  frame 

(h).  In  this  case  the  projection  is  to  the  corners  of  the  unit  square.  Once  this  map  is  known  ps  con  also  be 
transformed  to  the  new  frame  and  its  coordinates  (x,  y)  used  as  invariants. 

2.3.1  Mapping  Five  Points  to  the  Canonical  Frame 

As  four  points  define  a  projective  mapping  between  two  frames,  the  first  four  points  of  a  set  of  five  can  be 
used  to  define  the  map  between  the  image  frame  and  a  standard  measurement  or  canonical  frame.  The  fifth 
point  can  then  be  mapped  to  the  new  frame  in  which  its  coordinates  are  projectively  invariant.  To  ensure 
that  the  coordinates  really  are  invariant,  the  first  four  points  must  always  be  mapped  to  a  standard  set  of 
four  reference  points  in  the  canonical  frame.  The  choice  of  these  points  is  arbitrary;  the  corners  of  the  unit 
square  may  be  used  (as  in  figure  4),  or  some  other  frame  chosen  according  to  noise  performance. 

2.3.2  Mapping  a  Plane  Curve  to  the  Canonical  Frame 

The  aim  is  to  find  four  distinguished  points  (or  lines)  on  a  curve,  and  use  these  to  define  the  projectivity  T 
that  can  be  used  to  take  the  whole  curve  to  the  canonical  frame.  The  method  is  shown  in  figure  5:  for  the 
given  concavity,  the  location  of  the  points  of  bitangency  is  determined  as  described  in  section  2.4.3.  These 
are  (A)  and  (D),  and  they  segment  the  curve  of  interest  from  the  rest  of  the  edge  chain.  This  curve  segment 
is  known  as  an  A4  curve.  The  cast  tangents  are  then  determined,  these  are  lines  tangent  to  the  M  curve 
that  pass  through  the  bitangency  points.  The  points  of  cast  tangency  are  (B)  and  (C).  The  projection  of 
the  M  curve  into  the  frame  using  T  is  the  curve  signature;  it  is  a  projective  representation  of  the  original 
object  curve. 

2.3.3  D  is  criminat  ion 

Examples  of  the  canonical  frame  construction  for  single  views  of  three  different  objects  are  given  in  figure 
6.  A  single  M  curve  for  each  spanner  and  the  pair  of  scissors  is  marked  in  (a),  (b)  and  (c),  and  these 
are  projected  into  the  same  canonical  frame  in  (d).  All  three  canonical  curves  are  different  and  so  the 
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Figure  5:  (a)  Construction  of  the  four  points  necessary  to  define  the  canonical  frame  for  a  concavity.  The 
first  two  points,  (A)  and  (D),  are  points  of  bitang ency  that  mark  the  entrance  to  the  concavity.  Two  further 
distinguished  points,  (B)  and  (C),  are  obtained  from  rays  cast  from  the  bitangent  contact  points  and  tangent 
to  the  curve  segment  within  the  concavity.  These  four  points  are  used  to  map  the  curve  to  the  canonical 
frame,  (b)  the  curve  in  the  canonical  frame.  A  projection  is  constructed  that  transforms  the  four  points  in 
(a)  to  the  comer  of  the  unit  square.  The  same  projection  transforms  the  curve  into  this  frame. 

construction  provides  discrimination  (although  the  spanner  curves  extracted  from  (a)  and  (b)  are  reasonably 
similar,  they  are  sufficiently  different  for  recognition  purposes). 

2.3.4  Semi-Local  Description 

Non-global  descriptions  must  be  used  if  objects  are  to  be  recognised  under  occlusion;  the  canonical  frame 
construction  provides  a  semz-/oca/ object  description.  Furthermore,  for  genuine  tolerance  to  occlusion,  there 
must  be  a  number  of  different  descriptors  on  each  object  so  that  there  is  not  an  excessive  requirement  for 
any  single  object  region  to  be  visible.  This  is  called  redundancy.  Single  objects  frequently  possess  large 
numbers  of  bitangents  (see  figure  13);  this  provides  a  high  degree  of  redundancy  as  each  bitangent  can  be 
used  to  derive  a  canonical  frame.  However,  such  a  high  degree  of  redundancy  is  not  required  for  recognition, 
and  only  a  few  bitangents  are  actually  used  for  shape  description.  For  the  spanner  in  figure  6a,  four  suitable 
bitangents  exist  and  bound  M  curves.  The  four  M  curves  are  shown  in  figure  7. 

2.3.5  Stability 

The  final  criteria  discussed  in  this  section  is  stability:  if  the  construction  is  to  be  useful,  similar  frame  curves 
must  be  obtained  from  different  views  of  the  same  object  curve.  Even  if  the  curves  are  not  identical,  they 
should  be  sufficiently  similar  so  that  discrimination  between  objects  is  possible.  This  is  the  case  for  the 
canonical  frame  construction.  Three  very  different  views  of  a  spanner  are  given  in  figure  8  (they  vary  by  a 
full  perspective  distortion,  and  not  just  an  affine  one).  The  same  A4  curve  is  marked  in  each  image,  and 
these  are  mapped  to  the  canonical  frame  in  (d).  As  can  be  seen,  the  construction  is  stable  even  over  a  wide 
change  in  viewpoint. 


14 


Figure  6:  In  (a)-(c)  a  single  M  curve  and  the  four  distinguished  points  are  marked  on  each  object.  The 
three  curves  are  projected  to  the  canonical  frame  and  superimposed  in  (d).  The  scissor  M.  curve  is  obviously 
very  different  from  each  spanner,  but  in  fact  the  two  spanner  curves  are  sufficiently  different  for  recognition 
purposes. 


Figure  7:  Even  for  a  simple  object  such  as  a  spanner  there  is  a  sufficient  degree  of  redundancy  when  the 
canonical  frame  construction  is  used.  Here,  four  useful  M  curves  are  shown  that  essentially  cover  the  entire 
perimeter  of  the  object,  and  yet  each  one  is  potentially  a  sufficient  recognition  cue  on  its  own. 
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Figure  8:  (a)-(c)  three  views  of  a  spanner  with  the  extracted  M  curves  and  distinguished  points  marked. 
Note  the  very  different  appearance  due  to  perspective  effects,  (d)  shows  the  canonical  frame  curves  for  the 
three  different  views.  The  curves  are  almost  identical  demonstrating  the  stability  of  the  method.  Of  course^ 
the  same  curve  would  result  from  a  projective  transformation  between  the  object  and  canonical  frame. 


2.3.6  Index  Functions  and  Discrimination 

The  canonical  frame  curves  are  essentially  projectively  invariant  templates  for  the  shapes,  and  so  one  may 
attempt  M  curve  recognition  using  traditional  curve  correlation  matching  techniques  with  model  curves. 
Such  techniques  would  lead  to  a  linear  search  of  the  model  library,  so  instead,  an  index  is  constructed  from 
the  signature.  The  goal  is  to  compute  a  function  of  the  signature  that  uniquely  identifies  the  M  curve.  The 
current  solution  is  to  use  a  few  points  along  the  signature  to  construct  the  index.  This  data  is  adequate  to 
distinguish  the  spanners  and  brackets  used  in  our  experiments.  The  complete  signature  is  retained  as  part 
of  the  model  description  and  used  during  verification  as  a  more  complete  representation  of  the  object  shape. 

The  invariant  indexes  used  are  constructed  using  the  geometry  of  figure  9.  This  construction  is  similar 
to  the  technique  o^  footprints  [Lamdan88],  though  points  are  used  rather  than  areas.  The  drawback  of  this 
method  for  measuring  invariants  is  the  ambiguity  occurring  when  a  ray  crosses  the  curve  more  than  once. 
However,  such  multiple  crossing  did  not  occur  for  the  model  base  used  in  our  experiments.  The  vector  of 
invariant  line  lengths  1  is  not  used  directly  as  an  index.  Instead,  an  index  vector  M  is  constructed  from  1 
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1  =  ( 0.339, 0.308, 0.301, 0.280, 0.288, 0.294, 0.313, 0.350, 0.385  ) 

M  =  ( -0.089,  -0.181,  -1.089,  -0.169,  0.602,  -0.183,  -0.173  ) 

Figure  9:  A  set  of  n  equally  spaced  rays  are  drawn  from  the  point  (|,0)  so  that  they  intersect  the  curve 
signature.  The  aim  is  to  construct  an  n-dimensional  length  vector  1  =  (/i, .  ..,ln)^,  where  li  is  the  distance 
from  the  intersection  point  of  the  i"*  ray  to  the  point  (§,0).  This  distance  is  projectively  invariant.  Here 
n  =  9.  The  invariant  index  M  is  related  to  I  by  M  =  El,  where  E  *5  provided  by  a  linear  classifier.  See  text 
for  details. 

using  a  statistical  classifier  over  all  curves  in  the  model  base.  There  are  two  advantages  of  this:  first,  the 
index  is  more  discriminating  than  the  “raw”  lengths;  second,  the  dimension  of  the  index  can  be  reduced  and 
so  the  computation  of  an  efficient  hashing  function  is  simplified.  The  Fisher  linear  discriminant  [Duda73], 
which  is  an  optimal  linear  classifier,  is  used  for  the  computation  of  the  index.  The  discriminant  encodes 
information  by  minimising  the  intra-class  variance  (that  is  over  several  examples  of  the  same  curve)  and 
maximising  the  inter-class  separation.  It  does  so  by  transforming  the  data  to  a  new  (orthogonal)  basis, 
M  =  E  1,  such  that  feature  measurement  variance  is  maximised  under  projection  onto  some  of  the  basis 
directions,  and  minimised  onto  others. 

Each  basis  coordinate  is  ranked  by  how  much  discrimination  it  yields.  Then,  enough  of  the  highest  ranked 
coordinates  are  chosen  to  provide  the  desired  separation  between  the  classes.  It  was  found  that  taking  seven 
elements  of  the  Fisher  discriminant  basis  are  sufficient  to  define  and  discriminate  a  projectively  invariant 
description  for  each  curve  class.  An  example  of  the  Fisher  basis  is  shown  in  figure  10.  A  benefit  of  using 
the  classifier  for  model  learning  is  that  an  analytic  understanding  of  the  statistical  characteristics  of  the 
invariant  measures  is  not  required.  Instead,  a  number  of  examples  of  a  single  class  is  built  up  over  a  number 
of  images,  and  the  classifier  adjusts  its  action  to  account  for  the  variation  within  each  class. 
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Figure  10:  Example  of  the  Fisher  Linear  discriminant.  The  discriminant  is  trained  here  only  three  classes, 
two  of  which  are  shown  (black- circles- dashed~line  and  grey- circles- dotted-line).  In  each  case  the  curve  is 
represented  as  a  vector  of  canonical  frame  ray  lengths,  each  component  corresponding  to  a  different  angle 
(see  figure  9).  For  each  class  a  number  of  vectors,  measured  for  the  same  curve  with  varying  viewpoint,  are 
included  in  order  to  model  the  intra-class  variation.  The  first  eigenvector  weighting  function  produced  by 
the  discriminant  is  shown  (white- circles-solid-line).  The  first  invariant  is  determined  as  a  scalar  product  of 
the  eigenvector  and  the  (mean)  vector  for  each  class.  Clearly,  this  invariant  will  show  good  discrimination 
between  the  two  classes  shown. 

2.4  Segmentation  and  Grouping 

LEWIS  requires  the  segmentation  of  two  different  categories  of  features  from  image  data:  lines  and  conics  for 
the  algebraic  invariants;  and  A4  curves  (images  curves  terminated  with  common  tangents)  for  the  canonical 
frame  construction.  The  features  are  grouped  depending  on  the  type  of  invariant  that  they  form. 

The  first  step  in  the  feature  detection  process  is  edge  detection.  We  have  used  an  implementation  of  the 
Canny  edge  filter  [CannySG].  The  next  process  is  segmentation,  this  can  be  broken  down  into  three  phases: 

1.  The  extraction  of  discrete  edgel  chains  from  the  image. 

2.  The  location  of  breaks  between  features,  and  more  generally  the  boundaries  between  each  feature  and 
other  data. 

3.  The  accurate  representation  of  image  features. 

The  first  step  is  common  for  both  the  algebraic  and  smooth  curve  invariants.  Single  edge  curves  are  extracted 
from  the  edge  image  using  a  sequential  edgel  chain  linking.  The  Canny  algorithm  produces  edges  with  sub¬ 
pixel  accuracy.  This  edgel  position  accuracy  yields  invariant  values  with  smaller  variances  (about  10%  better) 
than  those  computed  from  integer  pixel  locations.  The  reason  that  using  more  precise  edgel  locations  does 
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Figure  11:  (b)  extracted  edge  data  (Canny)  from  (a);  note  that  the  edge  detector  fails  to  locate  edges  near 
shadows  and  that  objects  (such  as  the  bracket  on  the  right  hand  side  of  the  image)  have  a  finite  thickness 
and  so  two  edges  are  repoHed.  The  fitted  conics  and  lines  are  shown  in  (c)  where  there  is  generally  accurate 
location  of  both  tangency  and  curvature  discontinuities. 

not  produce  such  a  dramatic  improvement  in  the  quality  of  the  measured  invariants  is  that  the  representation 
process  (principally  fitting)  is  able  to  smooth  out  quantisation  errors  present  in  the  integer  edgel  locations. 

Even  with  hysteresis,  single  pixel  breaks  can  occur  in  the  edge  chains.  Such  events  are  accommodated 
by  directional  look  ahead  in  a  sequential  scan  of  the  edge  chain.  As  the  quality  of  the  edge  data  in  the 
images  of  interest  is  generally  quite  good,  single  pixel  look-ahead  works  well  for  the  objects  and  illumination 
conditions  used  in  our  experiments.  The  details  of  the  later  stages  of  the  segmentation  process  depend  on 
the  type  of  invariant  that  is  to  be  formed,  and  the  different  techniques  are  discussed  below. 


2.4.1  Algebraic  Features 

Lines  and  conics  are  fitted  to  extracted  edge  chains  using  efficient  incremental  routines  based  on  orthogonal 
regression  for  lines  and  an  improved  version  of  the  Bookstein  algorithm  for  conic  fitting  [Bookstein79].  Full 
details  of  the  algorithms  are  given  in  [Rothwell94] .  An  example  segmentation  is  shown  in  figure  11  where  it 
is  seen  that  a  reasonably  complete  description  is  obtained  of  the  object  boundaries. 


19 


2.4.2  Grouping 


Exploiting  structure  in  the  scene  for  grouping  allows  invariant  indexing  to  have  a  low  complexity  with  respect 
to  the  number  of  image  features.  The  approach  used  makes  use  of  the  connectivity  provided  by  the  edge 
chains,  this  implicitly  encodes  proximity. 

For  algebraic  invariants,  connectivity  provides  an  association  and  also  an  ordering  on  the  lines:  invariants 
are  formed  from  sets  of  consecutive  lines  within  single  edge  chains  at  a  cost  that  is  linear  in  the  number  of 
lines  in  the  scene,  0(/).  This  type  of  grouping  was  also  exploited  by  Huttenlocher  who  also  achieved  linear 
grouping  cost  [Huttenlocher88]. 

The  use  of  algebraic  curve  features  rather  than  isolated  points  and  lines  also  reduces  the  combinatorial 
cost  of  grouping.  In  the  case  of  the  invariant  formed  by  a  conic  and  three  lines  the  cost  of  grouping  is 
where  c  is  the  number  of  conics  and  /  the  number  of  lines.  This  is  for  a  case  in  which  no  image  structure  is 
assumed,  if  connectivity  is  reliable,  the  cost  reduces  to  0(cl).  The  grouping  cost  for  the  joint  conic  invariants 
is  only  O(c^).  For  the  images  under  consideration  /  is  in  the  order  of  a  hundred,  and  c  a  few  tens. 

2.4.3  The  Canonical  Frame 

The  canonical  frame  construction  requires  the  accurate  location  of  distinguished  points.  Stable  point  con¬ 
structions  are  achieved  using  curve  bitangents  and  points  defined  by  cast  tangents.  In  order  to  form  a 
projective  coordinate  frame,  the  canonical  frame,  four  such  distinguished  points  must  be  found.  It  is  de¬ 
sirable  to  achieve  a  canonical  projection  of  the  object  boundary  curve  which  is  minimally  distorted  and 
has  a  roughly  uniform  variance  distribution  due  to  image  segmentation  effects.  In  practice,  this  is  achieved 
by  placing  the  points  in  the  canonical  frame  in  positions  that  correspond  to  a  fronto- parallel  view  of  the 
object  [Rothwell94]  (yielding  an  equiform  distortion  of  the  object). 


Bitangent  Location 

Image  bitangents  are  located  using  the  following  four  stage  algorithm: 

•  Eliminate  points  that  lie  on  approximately  straight  portions  of  curve.  These  cannot  correspond  to 
actual  points  of  bitangency  and  so  should  be  ignored. 

•  Find  points  on  the  same  edge  curve  that  have  approximately  common  tangents. 

•  Check  that  such  pairs  of  points  do  in  fact  correspond  to  bitangents. 

•  Improve  the  localisation  of  the  bitangent  points  using  quadratic  interpolation. 

Straight  portions  of  curve  are  found  by  fitting  a  straight  line  to  short  segments  of  the  curve  using  orthogonal 
regression  and  testing  the  value  of  the  fitting  residual.  Approximately  straight  portions  will  have  a  low 
residual.  The  next  step  is  to  map  the  curve  into  its  tangent  dual  space  and  look  for  self-intersections  of  the 
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Figure  12:  For  continuous  curves  hitangenis  in  the  image  correspond  to  self-intersections  in  the  tangent  dual 
space.  Likewise  inflections  correspond  to  cusps. 

dual  curve.  Bitangents,  where  a  line  is  tangent  to  the  curve  at  two  points,  correspond  to  self  intersections 
in  dual  space  as  shown  in  figure  12.  The  mapping  of  a  boundary  curve  into  tangent  dual  space  is  based  on 
the  parameters  of  a  running  line  fit  to  the  curve.  The  fitted  line  is  locally  tangent  at  each  point  along  the 
curve.  The  representation  of  the  dual  space  for  a  curve  is  essentially  the  same  as  a  Hough  space  for  lines 
and  is  is  parameterised  by  the  slope,  0,  of  the  local  tangent,  and  the  perpendicular  distance  of  the  tangent 
to  the  centre  of  the  image. 

The  dual  space  is  quantised  into  discrete  cells  of  angle  and  distance.  Since  the  image  curve  is  discrete,  at 
points  of  high  curvature  the  difference  in  tangent  direction  can  vary  significantly  between  adjacent  points. 
This  quantisation  problem  is  overcome  by  linearly  interpolating  between  consecutive  points  in  dual  space. 

Self  intersections,  and  hence  bitangents,  are  found  using  a  voting  scheme  in  the  tangent  parameter 
space.  Two  image  points  voting  in  the  same  quantised  cell  represent  a  self- intersection.  Due  to  small  curve 
fluctuations,  joint  cell  occupancy  does  not  always  correspond  to  actual  bitangents.  False  bitangents  are 
detected  by  examining  regions  of  the  image  curve  in  the  proximity  of  bitangent  points.  Dual  space  provides 
the  location  of  the  bitangencies  up  to  discrete  pixel  coordinates.  Significant  improvement  in  accuracy  can 
be  obtained  by  interpolating  the  bitangent  locations  between  the  actual  measured  edgel  locations.  A  local 
quadratic  fit  determines  the  location  of  the  bitangent  points  to  sub-pixel  accuracy.  This  is  done  by  rotating 
the  image  data  so  that  the  initial  estimate  of  the  bitangent  line  is  horizontal,  and  fitting  (by  regression) 
quadratics  of  the  form  y  =  ax^ bx -h  c  to  data  sets  either  side  of  the  two  bitangent  points.  The  cost  used  is 
the  error  in  the  y  direction.  The  interpolated  bitangent  is  the  line  simultaneously  tangent  to  both  parabolas. 
In  the  implementation  the  number  of  points  used  for  each  quadratic  fit  is  13  (6  either  side  of  the  hypothesised 
bitangent  point).  The  data  sets  are  centrally  weighted  using  a  Gaussian.  The  weighting  was  set  empirically 
by  observing  how  the  quality  of  canonical  frame  construction  changed  as  the  number  of  points  was  altered. 

The  bitangent  detection  scheme  finds  many  bitangents  along  single  image  curves.  This  is  demonstrated  in 
figure  13.  Due  to  excessive  redundancy  in  the  shape  representation  many  of  the  bitangents  can  be  eliminated 
from  consideration,  preferably  those  that  are  not  stable: 
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Figure  13:  In  (b)  ii  is  shown  that  there  are  a  large  number  of  biiangents  that  can  be  found  even  for  a  simple 
object  such  as  the  spanner  in  (a).  Each  one  enables  the  construction  of  a  canonical  frame  curve,  though  only 
curves  that  do  not  cross  their  own  bitangents  are  used.  This  reduces  the  level  of  redundancy.  Bitangents  that 
will  not  produce  a  stable  construction  are  also  deleted;  this  leaves  the  three  bitangents  shown  in  (c). 


•  Eliminate  any  bitangents  that  have  their  endpoints  too  close  together. 

•  Remove  bitangents  whose  associated  M  curves  are  not  very  deep  (only  a  few  pixels). 

•  Do  not  use  tangents  that  cross  the  image  curves.  These  tangents  will  be  stable,  but  eliminating  such 
tangents  leads  to  a  simpler  canonical  frame  signature. 


Cast  Tangents 

A  cast  tangent  is  a  ray  from  the  bitangent  point  which  is  tangent  to  the  M  curve.  The  cast  tangent  is 
made  unique  by  selecting  the  tangent  ray  making  the  largest  angle  with  respect  to  the  bitangent  line.  The 
construction  is  projectively  invariant  and  cast  tangents  are  found  in  a  manner  similar  to  that  for  the  bitangent 
point,  again,  localisation  of  the  contact  point  is  improved  by  quadratic  fitting. 

A  sample  segmentation  for  a  simple  view  of  a  spanner  is  given  in  figure  14,  in  which  the  bitangent  and 
cast  tangent  points  and  lines  are  superimposed  onto  the  object.  The  bitangent  points  bound  the  M  curves 
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Figure  14:  For  a  simple  object  such  as  the  spanner  shown  in  (a)  there  are  two  reliable  M  curves  that  can 
be  constructed.  The  bitangent  and  cast  tangent  points  and  lines  are  shown  superimposed  in  (b).  These  are 
located  to  sub-pixel  accuracy  using  the  quadratic  fitting  method  described  in  the  text.  The  A4  curves  bounded 
by  their  bitangent  points  are  shown  in  (c).  The  M  curves  at  the  ends  of  the  spanner  are  not  used  because 
their  canonical  frames  cannot  be  determined  stably. 


that  are  shown  in  (c). 
Grouping 


The  canonical  frame  construction  has  a  linear  grouping  cost.  This  is  because  all  of  the  features  used  to  form 
the  frame  are  ordered  around  single  image  curves.  This  result  is  identical  to  that  of  [Huttenlocher88] ,  and 
means  that  recognition  using  the  construction  is  very  efficient. 


2.5  Errors  in  the  Invariant  Measurements 

Before  an  indexing  scheme  is  implemented,  the  error  distribution  of  the  invariant  functions  must  be  deter¬ 
mined  in  order  to  determine  whether  a  measured  image  index  value  is  within  an  acceptable  experimental 
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Table  4:  The  mean  values  for  the  three  invariants  measured  from  the  image  sequence  based  on  the  images  in 
figure  15.  The  standard  deviation  <r  is  computed  both  as  an  absolute  value,  and  as  a  percentage  of  the  mean. 
Note  that  the  Scr  mark  is  well  within  5%  of  the  mean,  and  so  such  a  bound  could  be  used  during  recognition. 


h 

h 

mean 

(0.707,2.252) 

(0.752,1.492) 

■(ilifegMtZfiW 

a 

(0.0031,0.0170) 

(0.0032,0.0086) 

(0.0052,0.0433) 

<Ti%) 

(0.44,0.76) 

(0.43,0.58) 

(0.99,1.42) 

error  bound  of  the  actual  model  value.  The  rest  of  this  section  describes  a  pair  of  experimental  investiga¬ 
tions  into  the  expected  sizes  of  the  invariant  errors  (one  for  algebraic  invariants  and  one  for  canonical  frame 
invariants).  For  the  algebraic  case  the  empirical  investigation  is  compared  to  an  analytic  calculation. 

2.5.1  Algebraic  Invariant  Errors 

One  can  obtain  a  rough  guide  to  the  size  of  expected  errors  under  ideal  imaging  conditions  by  differen¬ 
tiating  the  invariant  expressions,  and  assuming  an  isotropic  noise  distribution;  such  analysis  was  done 
in  [Forsyth91,  Sinclair93],  and  is  also  given  here.  The  short-comings  of  this  type  of  formulation  become 
apparent  when  real  images  are  observed.  The  only  way  to  understand  the  errors  that  may  be  encountered 
within  a  recognition  system  is  to  study  real  images.  All  theoretical  analysis  has  to  assume  some  error  model 
in  image  measurements;  frequently  this  is  founded  upon  a  Gaussian  error  in  the  locations  of  individual  edge 
locations  due  to  what  is  often  called  image  noise.  The  results  given  below  demonstrate  that  errors  occur 
due  to  behaviour  of  standard  edge  detectors  used,  and  cannot  be  attributed  to  random  noise. 

Empirical  Investigation 

The  first  and  twenty-eighth  images  from  the  sequence  used  to  do  the  tests  are  shown  in  figure  15.  The  rest  of 
the  sequence  of  fifty  images  were  constructed  by  rotating  the  object  at  2^  increments  on  the  calibration  table 
beneath  the  object.  The  lines  fitted  to  the  edge  data,  with  the  seven  lines  used  to  compute  the  invariants, 
are  shown  in  figure  16.  The  direction  of  rotation  used  to  form  the  sequence  is  also  marked.  Three  different 
five-line  invariants  were  computed  for  each  image  of  the  object  using  these  lines.  Note  that  the  object  is 
specular,  and  is  on  a  black  background  that  is  also  somewhat  specular.  While  the  images  do  not  represent 
ideal  imaging  conditions,  edge  detection  is  expected  to  be  quite  reliable,  since  image  step  edge  contrast  is 
large  over  the  entire  boundary. 

The  mean  invariant  values  for  the  image  set  are  shown  in  table  4.  These  results  show  that  the  invariants 
are  in  fact  very  stable,  with  standard  deviations  less  than  1.5%  of  the  mean  values.  From  these  results  the 
error  measurements  that  are  used  during  recognition  and  acquisition  are  chosen.  For  the  former  the  aim  is 
to  eliminate  as  many  false  negatives  as  possible  and  so  the  error  bound  is  high  (that  is  5%,  which  is  well 
above  the  3(T  mark),  but  during  acquisition  one  should  be  more  cautious  so  that  only  stable  invariants  are 
used  (and  so  3%  is  used,  roughly  equal  to  2<t). 
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Figure  15:  The  first  and  twenty-eighth  image  in  the  fifty  image  sequence  used  to  test  the  reliability  of  the 
invariants.  The  rest  of  the  sequence  was  produced  by  rotating  the  calibration  table  by  2°  between  images. 
Three  five-line  invariants  can  be  computed  for  this  object  using  the  seven  longest  lines.  The  twenty-eighth 
image  is  when  line  3  (labelled  on  figure  16)  becomes  veHical  and  both  lower  and  upper  edges  of  the  object  are 
visible.  From  this  viewpoint,  the  location  of  the  edge  boundary  becomes  ill  defined. 


Figure  16:  The  lines  fitted  to  Canny  edge  data  from  figure  15a.  The  seven  lines  used  to  compute  the 
invariants,  and  the  direction  of  rotation,  are  marked. 

The  value  of  h,  computed  on  the  sequence  of  lines  2  through  6  is  plotted  with  an  enlarged  scale  in  figure 
17.  The  shape  of  the  graph  is  characteristic  of  all  of  the  invariant  constructions.  The  graph  can  be  split  into 
three  distinct  regions: 

1.  Region  A:  All  of  the  lines  are  located  reasonably  well  by  the  Canny  edge  detector,  and  so  the  measured 
invariants  remain  constant. 

2.  Region  B;  When  the  object  has  been  rotated  so  that  line  2  becomes  vertical  in  the  image,  the  edge 
on  the  lower  surface  parallel  to  it  to  becomes  visible.  The  edge  detector  does  not  find  a  pronounced 
second  edge  in  this  orientation,  but  because  of  its  presence  the  intensity  values  no  longer  form  a  step 
edge  at  the  correct  feature,  but  instead  a  slope.  The  Canny  output  locates  a  position  somewhere  along 
the  slope  and  not  at  the  top  edge.  The  fitted  line  is  therefore  incorrect  and  causes  the  invariant  value 
to  be  measured  erroneously.  Notice  that  as  the  object  is  rotated  more,  the  invariant  value  tends  to 
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Figure  17:  When  the  invariants  (in  this  case  ike  second  value  of  the  second  invariant)  are  plotted  in  greater 
detail,  a  systematic  error  becomes  apparent  in  their  measurement.  This  is  due  to  the  edge  detector  becoming 
distracted  towards  spurious  image  features,  and  is  not  due  to  image  noise. 

decrease  (though  noisily),  this  is  because  the  slope  causes  the  fitted  line  to  move  further  and  further 
away  from  the  correct  edge. 

3.  Region  C:  In  this  region  the  effect  is  more  pronounced  as  edge  3  moves  through  the  vertical.  When 
the  fitted  line  drifts  off  the  actual  geometric  edge,  there  is  an  obvious  systematic  error  in  the  invariant 
measurement. 

As  can  be  seen  from  the  graph  the  systematic  errors  produced  by  the  edge  detector  far  outweigh  any  Gaussian 
or  quantisation  noise  observed  in  the  points.  Such  noise  will  still  be  present,  though  its  effects  are  small 
compared  to  other  errors.  It  could  be  observed  more  clearly  by  removing  the  effects  of  the  systematic  error. 
Note  that  other  unmodelled  image  events,  such  as  shadows  and  close  proximity  of  other  objects,  will  also 
hinder  the  extraction  of  the  planar  geometric  boundary. 

Analytic  Investigation 


The  gross  effects  of  the  systematic  error  can  be  estimated  by  perturbing  the  invariant  expressions.  Given 
the  expression  for  the  second  invariant: 

T  —  I^42i||M532| 

^  IM432IIM521I' 

the  aim  is  to  determine  the  effect  of,  say,  the  third  line  on  /2.  If  the  lines  used  to  evaluate  the  expression 
are  of  the  form  I,*  =  (a,-,  6,,  1)^,  i  E  {1, . . . ,  5},  then: 


daz 


j  1^245 1 
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Figure  18:  The  first  and  twentieth  views  of  the  spanner  in  the  sequence  are  shown.  The  M  curve  of  interest 
is  the  left-most  one  in  (a),  and  the  distant  one  in  (b). 
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The  model  for  the  error  observed  in  the  measurement  of  the  fitted  lines  is  a  translation  parallel  to  the  line 
normal  by  6.  If  the  gradient  of  the  line  is  tan  6,  by  letting  a  =  6/  cos  6  the  equation  of  the  perturbed  line  is: 


1  —  abs  1  —  otbs 


+  1  =  0. 


This  directly  yields  {daafdb,  dbs/dS),  from  which  6  can  be  estimated  given  a  known  Ah- 

From  region  C  of  figure  17  the  value  of  Ah  can  be  estimated  as  0.035.  This  is  assumed  to  be  due  entirely 
to  the  movement  of  I3  and  that  all  the  other  lines  are  measured  correctly.  Applying  the  analysis  yields  a 
value  ofS  =  2  pixels  for  this  Ah]  this  certainly  is  of  the  right  order  of  magnitude  for  the  error  in  fit  observed 
in  the  image  sequence. 


2.5.2  Canonical  Frame  Invariant  Errors 

An  empirical  experiment  similar  to  that  for  the  five  line  invariant  has  been  done  for  an  object  for  which 
canonical  frame  invariants  can  be  computed.  Two  images  from  the  sequence  used  to  measure  the  invariants 
are  shown  in  figure  18.  The  value  of  the  first  invariant  measured  for  each  image  is  plotted  in  figure  19  against 
spanner  orientation,  which  is  varied  through  180"  in  5"  increments.  Note  that  the  value  of  the  invariant  is 
stable,  but  again  a  systematic  error  is  apparent  when  the  graph  is  observed  in  more  detail. 


3  Recognition 

3.1  Overview 

An  outline  of  LEWIS  is  shown  in  figure  20. 
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Figure  19:  The  first  invariant  measured  for  the  image  sequence  in  figure  18  is  plotted  as  the  spanner  is 
rotated  by  between  images.  Note  in  (b)  that  the  edge  detector  produces  a  systematic  error  due  to  the  it 
being  distracted  by  the  finite  thickness  of  the  object. 

3.1.1  Feature  extraction  and  Invariant  Formation 

The  goal  of  feature  extraction  is  the  formation  of  geometric  primitives  suitable  for  constructing  invariants.  In 
the  algebraic  case  this  involves  straight  lines  and  conics,  and  for  non-algebraic  curves,  A4  curves  delineated 
by  bitangents.  The  fitting  and  grouping  processes  were  described  in  section  2.4. 

Once  sets  of  grouped  features,  F,  have  been  produced,  the  invariants  listed  in  sections  2.2  and  2.3  are 
computed.  Each  set  of  grouped  features,  or  A4  curve,  produces  a  number  of  invariants  (one  or  more)  which 
form  a  vector^  M(F).  Of  course,  if  the  object  is  occluded  to  the  extent  that  the  number  of  features  visible 
is  insufficient  for  an  invariant,  then  no  index  can  be  formed. 

The  invariant  vector  formed  by  the  above  process  (when  quantised),  represents  a  point  in  the  multidi¬ 
mensional  invariant  space.  Each  object  feature  group  is  represented  by  a  collection  of  points  that  define  a 
region  in  the  invariant  space,  the  size  of  which  depends  upon  the  measured  variance  in  the  invariant  value 
(see  section  2.5). 

3.1.2  Indexing 

The  invariant  values  computed  from  the  target  image  are  used  to  index  against  invariant  values  in  the  library. 
If  the  value  is  in  the  library,  a  preliminary  recognition  hypothesis  is  generated  for  the  corresponding  object. 

^At  this  stage  in  the  processing,  each  feature  group  is  used  to  form  a  separate  M  vector.  Interactions  between  the  groups 
are  only  considered  later. 
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Figure  20:  LEWIS  has  a  single  greyscale  image  as  input  and  the  outputs  are  verified  hypotheses  with  asso¬ 
ciated  confidence  values.  Many  of  the  processes  are  shared  by  the  acquisition  and  the  recognition  paths. 
The  recognition  system  is  similar  to  previous  systems  in  all  but  the  indexing  and  hypothesis  formation 
stages  [GrimsonQOj. 
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Each  type  of  invariant  (for  instance  that  for  five  lines,  or  a  conic  pair)  generate  separate  hypotheses. 

This  process  is  made  more  efficient  using  a  hash  table  that  allows  simultaneous  indexing  on  all  elements 
of  the  measurement  vector.  In  the  experiments  to  date  there  has  not  been  any  significant  problem  with 
collisions  in  the  hash  table.  Hash  table  collisions^  should  not  be  confused  with  the  intersection  of  object 
invariant  measurements  in  index  space.  These  intersections  lead  to  erroneous  hypotheses  which  cost  some 
effort  during  the  verification  stage,  but  are  usually  eliminated. 

3.1.3  Hypothesis  Merging 

Because  many  invariants  may  actually  correspond  to  the  same  object,  and  should  therefore  be  covered  by 
a  single  recognition  hypothesis,  joint  hypotheses  are  formed  prior  to  recognition  by  combining  ‘compatible 
hypotheses.  There  are  number  of  reasons  why  hypothesis  merging  is  desirable: 

1.  Backprojection  and  searching  for  image  support  (verification)  is  computationally  expensive  and  it  is 
more  efficient  to  validate  several  hypotheses  of  the  same  object  together. 

2.  More  features  facilitate  a  more  accurate  least  squares  calculation  of  the  back  projection  transformation 
(there  are  more  matched  model  and  image  features),  and  consequently  a  reduced  error  in  measuring 
image  support. 

3.  Many  hypotheses  indexing  the  same  object  in  a  single  part  of  the  scene  significantly  increase  confidence 
that  the  match  is  correct. 

During  hypotheses  merging,  an  interpretation  tree  is  constructed  for  each  object.  The  features  used  in 
the  tree  are  the  groups  of  invariant  features  that  were  successful  in  indexing.  The  merging  process  utilises 
topology  and  geometric  compatibility.  The  topological  consistency  (ordering  and  connectedness)  is  illustrated 
in  figure  21.  Geometric  consistency  is  implemented  efficiently  by  a  second  use  of  invariants;  this  time  joint 
invariants  between  the  feature  groups  used  to  compute  each  individual  hypothesis.  This  is  illustrated  in 
figure  22. 

Since  topological  relations  are  often  unreliable  it  is  possible  that  two  hypotheses  could  be  united  into 
a  single  joint  hypothesis  even  though  they  are  totally  unrelated  (for  example  one  may  represent  a  correct 
match  and  the  other  may  have  been  caused  by  clutter),  A  list  of  all  the  original  hypotheses  and  all  possible 
combinations  of  compatible  hypotheses  is  therefore  maintained.  The  list  is  ordered  by  descending  number  of 
simple  hypotheses  per  joint  hypothesis.  Those  with  more  simple  hypotheses  are  verified  first,  and  if  the  match 
is  confirmed,  other  joint  hypotheses  that  represent  partial  versions  of  the  hypothesis  are  deleted.  If  the  match 
is  not  confirmed  only  the  joint  hypothesis  under  consideration  is  deleted.  The  joint  hypothesis  formation 

hash  table  collision  occurs  when  a  number  of  models  have  the  same  hash  index.  Such  a  collision  can  occur  when  the 
number  of  hash  buckets  is  smaller  than  the  model  population  or  when  the  hashing  function  is  not  uniform  and  causes  many 
models  to  hash  to  the  same  bucket. 
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Figure  21:  If  the  same  model  is  indexed  by  a  five-line  invariant  (due  to  lines  1,-,  i  €  {1,  •  •  • ,  and  a  conic 
three-line  invariant  that  is  compatible  with  it  (due  to  C  and  li,  i  G  {2, . .  then  it  is  wise  to  verify  both 

hypotheses  together.  The  invariants  are  compatible  if  the  ordering  of  the  image  lines  are  consistent  with  those 
on  the  model;  see  text  for  details. 


Figure  22:  For  a  pair  of  M  curves  there  are  8  distinguished  points  which  could  be  used  to  form  2x8- 
8  =  8  different  five  point  invariants.  Rather  than  computing  so  many,  which  is  unnecessary,  invariants 
are  computed  between  the  four  distinguished  points  of  each  M  curve,  and  the  ‘central  point  of  the  other. 
This  yields  four  invariants,  and  does  so  using  a  symmetric  construction.  These  invariants  are  sufficient  to 
hypothesise  compatibility. 
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stage  can  potentially  cause  an  exponential  number  of  hypotheses  to  be  formed.  However,  in  practice,  deleting 
verified  hypotheses  keeps  the  verification  process  under  control  (as  is  shown  later  in  table  6). 

3.1.4  Verification 

There  are  two  steps  involved  in  verification,  both  of  which  can  reject  a  (joint)  recognition  hypothesis.  The 
first  is  an  attempt  to  compute  a  common  projective  transformation  between  the  model  features  and  the 
putative  corresponding  features  in  the  target  image.  The  second  is  to  use  this  transformation  to  project  the 
entire  model  onto  the  target  image,  and  then  measure  image  support. 

Incorrect  hypotheses  arise  because  grouped  image  features  happen  to  have  an  invariant  value  that  co¬ 
incides  (within  the  error  bounds)  with  one  in  the  library.  Also,  because  the  invariants  are  not  complete 
(completeness  is  defined  in  detail  in  [Rothwell94]),  structures  with  the  same  invariant  may  not  be  projec- 
tively  equivalent.  The  features  used  to  produce  the  matching  model  and  image  invariants  provide  sufficient 
constraints  to  compute  the  projective  transformation  between  the  model  and  image"^.  In  general,  the  pro¬ 
jective  transformation  is  over  determined  as  the  feature  groups  tend  to  provide  more  than  the  required  eight 
constraints.  Consequently,  if  a  common  transformation  cannot  be  computed,  the  features  are  not  projectively 
equivalent  and  the  hypothesis  is  rejected  [Rothwell94]. 

Backprojection  and  subsequent  searching  involves  the  entire  model  boundary,  not  just  the  features  used 
to  form  the  invariant.  Projected  model  edgels  must  lies  close  to  image  edgels  with  similar  orientation  (within 
5  pixels  and  15°).  In  the  case  of  algebraic  features,  two  preliminary  hypothesis  filtering  steps  can  be  invoked: 

1.  The  model  lines  must  project  to  within  15°  of  the  image  lines. 

2.  The  projected  model  conics  must  project  to  ellipses,  and  they  must  have  similar  circumferences  and 
areas  to  the  image  conics. 

Orientation  in  the  target  image  is  determined  from  the  Canny  edgel  orientation.  The  orientation  of  the 
projected  model  feature  is  determined  as  follows: 

1.  For  model  edgels  on  straight  lines  the  projected  orientation  of  the  line  is  used  for  each  edgel. 

2.  For  model  edgels  on  conics,  the  orientation  is  obtained  from  the  projected  conic  via  their  polars^  which 

are  close  approximations  to  the  tangents  for  edgels  close  to  the  conic. 

3.  For  other  edgels,  the  orientation  provided  by  the  edge  operator  is  used.  This  orientation  is  determined 
in  the  target  image  by  projecting  the  tangent  line  to  the  model.  Model  edgel  orientations  are  less 
accurate  (than  a  fitted  line  or  conic),  so  a  30°  threshold  is  used  instead  of  15°. 

^Unless  the  invariant  exploits  an  isotropy.  In  this  case,  certain  peirameters  of  the  transformation  are  unrecoverable  because 
they  do  not  affect  the  projected  geometry,  e.g.  a  circle  under  rotation  about  its  center. 

^For  a  point  x  and  a  conic  C,  the  polar  1  of  the  point  with  respect  to  the  conic  is  1  =  Cx. 
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Figure  23:  (a)  shows  a  simple  scene  with  two  objects  in  it.  The  output  of  the  Canny  edge  detector  is  shown 
in  (b).  The  3-4  distance  transform  is  computed  for  all  edges  over  a  certain  strength  and  is  displayed  in  (c); 
distance  from  the  edge  (white)  is  coded  by  intensity,  with  zero  being  black,  (d)  shows,  coded  by  intensity,  the 
orientation  of  the  the  edge  nearest  to  a  given  image  point. 

If  more  than  a  certain  proportion  of  the  projected  model  data  is  supported  (the  threshold  used  is  50%),  there 
is  sufficient  support  for  the  model,  and  the  recognition  hypothesis  is  confirmed.  The  final  part  of  the  process 
is  expensive  as  0(10^)  edgels  need  to  be  mapped  onto  the  image.  Efficiency  in  the  distance  computation 
is  achieved  by  approximating  the  distance  using  the  3-4  distance  transform  of  Borgefors  [Borgefors88].  The 
distance  transform  is  found  by  passing  chamfer  masks  over  the  image,  which  is  carried  out  within  image 
preprocessing.  An  example  of  the  3-4  distance  transform  output  for  a  simple  image  is  shown  in  figure  23. 

If  the  projected  model  is  too  small  in  the  scene  it  must  have  arisen  from  an  object  so  far  away  that  it 
would  not  be  observed  reliably.  An  upper  bound  on  the  size  of  the  projected  model  can  be  computed  by 
bounding  the  model  by  a  box  and  projecting  that  to  the  image  first;  if  it  is  too  small,  then  the  hypothesised 
object  must  be  too  small  and  so  can  be  rejected.  In  practice  the  bounding  box  used  is  the  perimeter  of  the 
acquisition  image. 

There  is  a  trade  off  involved  in  setting  the  support  threshold.  A  heavily  occluded  correct  match  may 
have  as  much  support  as  an  incorrect  match.  Particularly  if  there  is  dense  edge  data  (such  as  wood  texture), 
then  it  is  quite  likely  that  a  large  number  of  edges  may  be  close  to,  and  have  the  same  orientation  as,  the 
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projected  model  edges.  In  a  structured  scene,  a  few  erroneous  straight  lines  of  the  right  orientation  will  be 
sufficient  to  give  over  50%  support  for  a  model,  and  so  render  a  false  positive.  Obviously,  any  object  which 
is  over  50%  occluded  will  not  be  found  by  the  recogniser.  As  the  threshold  is  lowered,  an  occluded  object 
is  more  likely  to  be  found,  but  there  will  also  be  more  false  positives.  On  the  other  hand,  if  more  than  one 
invariant  forms  a  hypothesis  that  passes  verification,  the  level  of  confidence  in  the  result  is  high.  This  is 
discussed  further  in  section  4.1. 


3.2  Model  Acquisition  and  Library  Formation 

A  model  consists  of  the  following: 

1.  A  name. 

2.  A  set  of  edge  data  from  an  acquisition  view  of  the  object  for  use  in  the  backprojection  stage  of 
verification. 

3.  The  lines,  conics  and  M  curves  that  represent  the  edge  data. 

4.  The  expected  invariant  values  and  which  algebraic  features  or  M  curves  they  correspond  to. 

5.  The  bounding  box  of  the  model  features. 

6.  Topological  connectivity  relations  between  feature  groups  that  will  be  used  in  the  construction  of  joint 
invariants. 

The  library  is  segmented  into  different  sub-libraries,  one  for  each  type  of  invariant.  Each  sub-library  has  a 
list  of  each  of  the  invariant  values  tagged  with  an  object  name,  and  is  structured  as  a  hash  table. 

One  benefit  of  using  only  projective  representations,  rather  than  Euclidean,  is  that  model  acquisition  can 
be  done  directly  from  images.  No  special  orientations  or  calibrations  are  required.  Acquisition  is  simple  and 
semi-automatic  (for  instance,  curves  do  not  have  to  be  matched  by  hand).  It  proceeds  as  follows: 

1.  A  number  of  images  are  taken  of  the  isolated  object  from  a  variety  of  ‘standard  viewpoints  (for 
algebraic  invariants  two  images  are  used,  generally  for  the  canonical  frame  system  more  are  required 
to  compute  the  Fisher  discriminant). 

2.  The  invariants  are  computed  for  each  view.  This  involves  the  same  segmentation  and  invariant  com¬ 
putation  as  used  during  recognition.  For  non-algebraic  curves  significant  Ai  curves  are  extracted. 

3.  The  invariant  values  are  compared  between  views.  The  useful  invariant  shape  descriptors  will  remain 
reasonably  constant  under  a  change  in  viewpoint.  These  are  recorded  in  the  modelbase.  Any  measures 
that  are  not  constant  are  due  to  features  that  do  not  form  correct  invariant  configurations  (for  instance 
lines  that  are  not  coplanar),  or  are  caused  by  unstable  features.  For  matching  values  (within  3%,  see 
section  2.5),  the  mean  value  is  entered  into  the  model  library. 
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Figure  24:  Nine  of  the  models  in  LEWIS’S  model  base  (the  edge  data  of  the  models  is  shown). 

4.  Connectivity  between  algebraic  features  or  A4  curves  is  utilised  to  form  joint  invariants  to  be  used 
during  hypothesis  combination. 

3.3  Algebraic  Invariants  Examples 

The  results  reported  here  have  been  carried  out  with  a  model  library  containing  over  thirty  objects.  Typical 
algebraic  objects  in  the  library  are  shown  in  figure  24.  Recognition  accuracy  is  excellent  if  the  object 
boundaries  are  not  severely  disrupted  by  shadows  and  specularities.  On  a  SPARC  IPX,  edge  detection  takes 
15  seconds;  feature  extraction  5  seconds;  matching  less  than  a  second;  and  verification  normally  about  2 
seconds. 

The  first  recognition  example  is  a  bracket  in  a  scene  with  occlusion  and  clutter  caused  by  other  objects 
(figure  25).  All  possible  algebraic  invariants  are  formed  from  configurations  of  lines  and  conics.  The  measured 
and  the  matching  values  are  given  in  table  5.  From  a  scene  such  as  this,  a  large  number  of  possible  invariants 
can  be  derived.  It  was  found  that  two  image  five-line  invariants  matched  model  invariants  of  the  bracket, 
with  the  second  (incorrect)  one  ruled  out  during  verification.  Three  conic-and-three-line  invariants  were 
measured  in  the  scene  that  matched  the  invariants  of  the  bracket,  and  all  these  constituted  correct  matches. 
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Figure  25:  (a)  shows  the  bracket  occluded  in  a  scene.  Some  of  the  occlusion  is  due  to  an  object  not  in  the 
model  library.  In  (b)  the  edge  data  from  the  first  calibration  scene  are  shown  projected  onto  the  test  scene 
using  the  model  to  image  transformation  hypothesised  by  the  match.  The  close  match  between  the  projected 
data  (shown  in  white),  and  the  scene  edges  shows  that  the  recognition  hypothesis  is  valid.  Projected  edge  data 
from  the  model  of  a  spanner  are  also  shown  projected  into  the  scene  as  this  was  also  recognised. 


Table  5:  The  invariants  measured  from  figure  25  which  are  formed  by  features  actually  corresponding  to 
bracket  features.  The  second  column  shows  the  library  values  and  the  third  column  scene  values.  In  the 
fourth  column  the  deviations  from  the  mean  invariant  values  are  given;  this  shows  that  the  five-line  invariant 


is  very  stable  under  real  image  conditions,  and  the  conic- and-three-line  invariant  is  reasonably  stable. 


invariant 

library 

scene 

error  % 

five-line 

conic-line 

conic-line 

conic-line 

(0.8415,1.2340) 

(1.3410,1.3080,2.6285) 

(1.3080,1.3025,1.8850) 

(1.3025,1.3395,2.5915) 

(0.842,1.235) 

(1.372,1.291,2.676) 

(1.291,1.287,1.852) 

(1.287,1.365,2.604) 

(0.1,0.1) 

(2.3.1.3.1.8) 

(1.3.1.2.1.8) 
(1.2,1.9,0.5) 

Incorrectly  indexed  hypotheses  can  be  ruled  out  during  verification  when  the  hypothesised  model  is  projected 
into  the  scene  (all  such  hypotheses  were  ruled  out  in  this  case).  For  the  bracket,  74.5%  of  the  projected 
edges  match  to  within  5  pixels  and  15®  of  the  image  data.  There  is  a  second  object  from  the  model  base,  a 
spanner,  also  in  the  scene.  This  is  correctly  identified  using  three  different  invariants.  In  this  case  an  84.5% 
projected  edge  match  is  achieved  with  the  model  data  also  shown  in  white  in  figure  25. 

In  figure  26  the  bracket  is  recognised  despite  a  significant  amount  of  occlusion  (in  this  case  there  is  only 
a  59.3%  edge  match  during  verification).  Figures  27  to  45  show  the  system  operating  on  a  few  test  scenes 
with  some  of  the  match  statistics  shown.  For  figure  27,  1049  invariants  were  computed  which  indexed  41 
hypotheses.  These  were  converted  into  131  joint  hypotheses  that  had  to  be  verified,  of  which  13  were  rejected 
by  first  stage  verification,  based  on  valid  projective  transformations,  and  78  required  the  second  stage,  based 
on  image  support.  For  figure  28,  806  invariants  indexed  36  hypotheses,  forming  44  joint  hypotheses  of  which 
23  needed  the  second  verification  stage  after  13  were  rejected  by  the  first  stage. 

In  table  6  various  match  statistics  are  shown  that  have  been  taken  from  a  number  of  scenes  (around 
100)  similar  to  that  of  figure  26.  In  each  case,  a  single  object  from  the  model  library  was  in  the  scene, 
and  it  was  recognised  correctly  in  all  except  one  instance  which  W2is  when  verification  broke  down  due  to 
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Figure  26:  (a)  shows  ihe  bracket  occluded  in  a  scene  by  objects  not  in  the  library.  In  (b)  the  edge  data 
from  an  acquisition  scene  are  shown  projected  onto  the  test  scene  using  the  model  to  image  transformation 
hypothesised  by  the  match.  The  close  match  between  the  projected  data  (shown  m  white),  and  the  scene  edges 
shows  that  the  recognition  hypothesis  is  valid. 
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Figure  27:  (a)  shows  a  scene  containing  two  objects  from  the  model  base,  with  fitted  lines  (100  of  them)  and 
conics  (27)  superimposed  in  (b).  Note  that  many  lines  are  caused  by  texture,  and  that  some  of  the  comes 
correspond  to  edge  data  over  only  a  small  section.  The  lines  form  70  different  line  groups,  (c)  shows  the 
two  objects  correctly  recognised,  the  lock  striker  plate  matched  with  a  single  invariant  and  50.9%  edge  match, 
and  the  spanner  with  three  invariants  and  70.7%  edge  match. 


a 


b 


Figure  28:  Another  typical  scene  containing  three  objects  from  the  model  base.  The  recognised  objects  are 
outlined  with  74.7%  (2  invariants),  84.6%  (1  invariant)  and  69.9%  (3  invariants)  edge  matches  for  the  objects 
from  left  to  right  58  lines  and  14  conics  were  found. 

a  poor  segmentation  preventing  a  sufficient  amount  of  edge  support.  The  total  number  of  indexes  formed 
(an  average  of  1755.3),  depends  solely  on  the  number  of  features  in  the  scene,  and  the  way  in  which  they 
are  grouped.  This  number  roughly  equates  to  the  number  of  hypotheses  that  would  have  to  be  verified  per 
model  for  a  hypothesise  and  test  technique.  After  indexing,  these  form  only  an  average  of  60.4  hypotheses, 
which  constitutes  a  nearly  thirty-fold  reduction.  Because  of  redundancy  in  the  shape  representation,  multiple 
hypotheses  can  occur  for  a  single  model  instance.  Joint  hypothesis  formation  processing  yields  an  average 
of  72.7  joint  hypotheses. 

Verification  is  performed  once  the  joint  hypotheses  have  been  constructed.  On  average,  5.9  hypotheses 
do  not  have  to  be  verified  as  their  structures  are  subsumed  by  larger  joint  hypotheses.  This  means  that  only 
66.8  joint  hypotheses  actually  require  verification,  which  is  similar  to  the  original  60.4  individual  hypotheses. 
It  is  clear  that  the  joint  hypothesis  formation  stage  does  not  lead  to  an  exponential  number  of  hypothesis 
being  formed,  and  yet  it  provides  improved  recognition.  Once  the  projectivity  between  the  hypothesised 
models  and  the  image  features  has  been  computed,  a  check  is  made  that  the  projected  and  image  algebraic 
features  are  consistent  (the  preliminary  filter).  On  average  this  filter  removes  41.1  hypotheses,  6.4  due  to 
line  correspondences,  4.9  for  conic  and  line  configurations,  and  29.8  for  conic  configurations.  In  the  end,  only 
23.7  hypotheses  have  to  be  verified  through  full  back  projection,  compared  with  the  1755.3  original  indexes 
formed. 

In  each  case  a  single  object  should  have  been  recognised.  Essentially,  a  negligible  number  of  false  negatives 
are  observed.  One  false  positive,  on  the  average,  is  successfully  verified  in  a  given  image  in  addition  to 
the  correct  model  hypothesis.  The  false  positive  is  partly  due  to  the  symmetry  of  an  object,  where  the 
projected  boundary  can  achieve  good  support  from  the  set  of  image  features,  even  though  the  correspondences 
and  object  pose  are  incorrect  (such  as  in  figure  42).  False  positives  also  occur  due  to  confusion  between 
projectively  similar  objects,  that  is,  the  projective  transformation  generates  large  shape  equivalence  classes. 
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Table  6:  The  average  match  statistics  for  using  algebraic  invariants  within  LEWIS  taken  over  a  large  number 


number  of  actual  model  instances 

1.0 

total  number  of  indexes  formed 

1755.3 

total  number  of  individual  hypotheses 

60.4 

total  number  of  joint  hypotheses 

72.7 

number  not  requiring  any  verification 

5.9 

number  rejected  by  algebraic  test 

41.1 

number  rejected  through  full  back  projection 

23.7 

number  of  correct  hypotheses 

1.0 

number  of  false  positives 

1.0 

number  of  false  negatives 

0.0 

3.4  Canonical  Frame  Invariants  Examples 

3.4.1  Classes 

Typical  (non-algebraic)  objects  in  the  library  are  shown  in  figure  29.  The  object  M  curves  are  sufficiently 
similar  (in  all  cases  there  are  only  two  inflections)  to  allow  a  grouping  of  the  library  into  a  number  of  classes, 
see  figure  30.  Indexing  is  then  hierarchical:  first,  sub-parts  (classes)  are  indexed  and  verified.  For  the  class 
verification,  rather  than  backproject  the  whole  model  curve,  the  M  curve  alone  is  projected  into  the  canonical 
frame.  It  is  verified  by  measuring  the  difference  in  areas  between  the  image  class  and  the  model  class  curves 
(computed  using  rectangular  quantisation  in  the  canonical  frame).  If  the  difference  is  sufficiently  small,  the 
hypothesis  is  accepted.  This  covers  the  non-completeness  of  the  canonical  frame  invariants.  Second,  if  the 
class  is  accepted,  hypotheses  are  generated  for  each  of  the  models  in  that  class.  Joint  hypotheses  are  then 
formed  and  verified  by  back  projection  to  the  target  image  using  the  entire  boundary  model  curve. 

The  efficiency  of  the  indexing  process  can  be  demonstrated  empirically:  from  a  series  of  typical  images 
an  average  of  56  M  curves  were  observed;  27.8%  of  these  produced  class  hypotheses  on  indexing;  and  23.9% 
of  these  were  verified  as  classes  (only  6.6%  of  the  original  number  of  M  curves).  Note  that  although  a  large 
number  of  classes  were  hypothesised  in  the  scene,  only  14.0%  of  the  indexed  hypotheses  were  later  found  to 
be  incorrect.  Based  on  these  preliminary  results  (56  M  curves  found  in  image,  10  model  curves,  no  false 
positives)  it  would  seem  that  there  is  not  an  excessive  tendency  towards  false  positives. 

3.4.2  Recognition  Examples 

The  first  example  shown  in  figure  31  shows  a  simple  unoccluded  view  of  model  0.  This  object  can  be 
recognised  using  up  to  two  classes.  First,  the  classification  algorithm  locates  classes  0  and  1  as  marked  in 
figure  31b,  and  uses  these  to  form  a  single  joint  hypothesis  by  the  procedure  of  section  3.1.3.  The  joint 
hypothesis  is  verified  through  backprojection  in  which  92.8%  of  the  model  outline  is  matched  to  image  data. 
A  100%  confidence  is  not  found  (as  would  be  expected  for  an  unoccluded  object)  because  the  Canny  edge 
detector  fails  to  extract  and  localise  all  of  the  object  edges  correctly.  This  results  mainly  from  specularity 
on  the  object.  The  same  effect  can  be  observed  in  all  of  the  images  in  this  section  because  the  objects  are 
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Figure  29:  For  the  model  base  consisting  of  the  four  spanners  there  are  ten  useful  M  curves.  These  are 
shown  by  thick  lines  and  labeled  (a)  to  (i).  Due  to  the  projective  similarity  of  (e)j  (f)  and  (h)^  eight  classes 
are  sufficient  to  represent  the  local  shapes  of  the  spanners.  The  correspondence  between  the  M  curves  and 
the  classes  is  given  in  the  table.  The  global  shape  of  each  spanner  is  also  required  for  recognition,  this  includes 
the  geometric  constraints  between  each  class  (see  section  3.1.3  for  details),  and  also  the  entire  set  of  edge 
locations  and  orientation  data;  this  is  used  for  verification. 
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Figure  30:  The  canonical  curves  for  the  four  models  shown  in  figure  29.  Three  images  of  each  object  were 
used  and  the  curves  superimposed;  the  very  close  match  between  each  curve  highlights  the  stability  of  the 
construction.  Note  the  similarities  between  signatures  (e),  (f)  and  (h);  these  are  essentially  the  same  and 
are  therefore  represented  by  the  same  class.  All  the  other  signatures  are  in  their  own  class. 


41 


Table  7:  Matching  staiisiics  for  figures  34  to  37,  The  number  of  M  curves  extracted  from  the  images  and 
how  many  class  hypotheses  result  from  indexing  are  shown.  The  class  hypotheses  are  used  to  form  joint 
hypotheses  that  are  verified  or  rejected  by  the  following  tests:  if  a  larger  subsuming  joint  hypothesis  has 
already  been  accepted;  if  a  good  model  to  image  projectivity  cannot  be  computed;  if  backprojeciion  results  in 


Figure 

#M  curves 

^classes 

#jh 

#no  verification 

#poor  proj. 

#poor  pose 

34 

42 

13 

18 

0 

1 

5 

36 

79 

18 

23 

2 

0 

3 

37 

99 

24 

39 

4 

1 

2 

metallic.  Another  cause  of  edge  segmentation  failure  is  due  to  the  finite  thickness  of  the  objects,  as  discussed 
in  section  2.5.1.  Frequently,  an  edge  extracted  from  the  image  can  swap  between  portions  of  the  outline  on 
the  upper  and  lower  surfaces  of  an  object.  As  the  canonical  construction  is  local  this  does  not  present  a 
major  problem,  though  its  effects  are  occasionally  noticeable. 

In  figure  32  the  recognition  system  is  tested  on  a  more  complex  scene  where  there  is  clutter  and  occlusion. 
A  single  class  is  found  for  model  3  (class  5),  which  is  then  localised  correctly  in  the  image  to  give  55.5%  edge 
support.  Although  a  total  of  16  class  hypotheses  were  formed,  yielding  22  joint  hypotheses,  only  the  correct 
hypothesis  was  given  sufficient  confidence  by  backprojection  (over  50%  projected  edge  support). 

The  canonical  frame  construction  works  very  well  under  significant  perspective  distortion.  This  is  demon¬ 
strated  in  figure  33.  For  this  relatively  simple  scene  three  classes  are  found,  and  only  one  produces  a  hy¬ 
pothesis  that  passes  through  the  object  verification  procedure.  This  gives  an  83.6%  edge  match.  As  may  be 
seen  from  figure  29,  within  the  range  of  typical  signature  variation,  model  2  (which  is  the  one  identified  in 
figure  33)  is  projectively  2-cyclic®.  Thus,  the  spanner  will  always  be  projected  into  the  image  in  two  different 
poses  differing  by  the  equivalent  of  a  180®  rotation,  and  still  match  correctly. 

Figures  34  to  37  show  further  recognition  examples  in  which  the  correct  objects  are  always  recognised. 
No  false  positives  were  found  in  any  of  the  images,  though  this  is  not  always  the  case.  Although  some 
instances  have  sufficient  edge  support  the  hypothesis  is  rejected  based  on  size,  as  described  in  section  3.1.4. 
For  example,  model  0  was  identified  as  subsequently  rejected  os  shown  in  figure  35.  Full  details  of  the 
recognition  performance  are  given  in  table  7. 

The  algebraic  invariant  and  canonical  frame  representations  can  be  independently  applied  to  an  image  to 
recognise  objects  of  both  types.  Figure  38  shows  an  example  of  recognition  for  both  index  methods  together. 


3.5  Complexity 

The  grouping  cost  incurred  in  forming  the  invariants  was  discussed  in  section  2.4.  Here  we  first  propose  a 
simple  model  for  recognition  complexity,  and  then  verify  this  experimentally. 

®  An  object  is  projectively  2-cyclic  if  there  is  a  view  of  the  object  for  which  it  can  be  mapped  onto  itself  with  a  180®  rotation. 
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Figure  31;  (a)  shows  an  unoccluded  view  of  model  0.  The  classifier  correctly  locates  classes  0  and  1  in  (b). 
These  are  used  to  form  a  joint  hypothesis  for  model  0  which  is  verified  using  back  projection  that  finds  92.8% 
image  support  for  the  model.  This  is  the  only  model  match  found  that  has  a  reasonable  pose  (that  is,  the 
object  is  not  too  small).  Note  that  very  good  registration  of  the  object  is  achieved  in  (c),  this  is  when  both  M 
curves  are  used  to  compute  the  model  to  image  transformation.  Sometimes,  as  in  (d),  if  a  single  M  curve 
is  used  the  registration  is  good  in  the  region  of  the  curve,  but  extrapolates  poorly  over  the  rest  of  the  object. 
In  this  case  a  single  M  curve  is  still  sufficient  for  recognition  as  a  68.9%  projected  edge  match  was  found. 
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Figure  32:  For  the  occluded  and  cluttered  view  of  the  spanner  (a),  there  are  a  large  number  of  biiangents, 
(b ).  Note  that  bitangents  are  computed  only  along  single  continuous  edgel  chains  and  not  between  distinct 
curves.  This  further  ensures  a  linear  grouping  cost.  After  M  curve  formation  and  indexing,  a  total  of  16 
potential  class  matches  were  found.  However^  the  one  that  correctly  identifies  model  3,  marked  in  (c),  was 
the  only  one  that  produced  a  sufficiently  high  verification  score  (^b.b%)  to  be  accepted. 


3.5.1  Complexity  Model 

A  major  concern  with  the  effectiveness  of  an  indexing  function  is  the  probability  that  an  image  measurement 
taken  from  background  clutter  actually  indexes  a  model.  Often,  it  is  suggested  that  the  number  of  clashes 
produced  within  the  hash  table  is  important,  but  this  is  not  the  case.  The  hash  table  is  simply  an  implemen¬ 
tation  of  the  index  space,  and  should  be  designed  so  that  only  objects  with  matching  image  measurements 
are  returned  rather  than  those  having  only  matching  hash  key  values^. 

Here  an  informal  argument  is  given  that  determines  the  likelihood  that  a  random  measurement  will  index 
an  actual  model;  it  shows  that  the  indexing  paradigm  is  (non-asymptotically)  constant  time,  or  at  least  can 
be  made  so  with  judicious  use  of  the  indexes.  Consider  a  measure  for  a  set  of  features  that  forms  an  n 

^The  function  mapping  index  values  onto  hash  keys  is  many-to-one. 
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Figure  33:  Even  under  severe  perspective  distoHion  the  recognition  system  performs  well  and  finds  model  2 
with  83.6%  confidence.  Note  that  an  affine  description,  such  as  the  footprints  m  [LamdanSS],  would  fail  tn 

this  case. 


Figure  34:  Single  classes  are  sufficient  to  recognise  the  two  model  instances  shown  in  (b).  The  redundancy  of 
the  canonical  frame  representation  gives  much  better  tolerance  to  occlusion  than  global  shape  methods.  The 
left  hand  object  gained  67.1%  boundary  support,  and  the  right  object  81.6%. 
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Figure  35:  In  figure  34  incorrect  objects  were  identified.  Here,  model  0  receives  77.9%  edge  match,  but  has  a 
total  projected  object  width  of  24. 7  pixels,  which  is  too  small  to  be  a  reasonable  object  projection. 


Figure  36:  Two  classes  are  recognised  and  joined  into  a  single  joint  hypothesis  to  recognise  model  3  with 
68.0%  edge  support. 


dimensional  index;  assume  that  each  dimension  has  the  same  behaviour.  Let  each  index  cover  a  segment  on 
the  real  line  from  io  to  zq  +  L  (figure  39),  and  the  quantisation  along  the  line  be  a  constant  quantity  over 
the  line  segment®.  There  are  6  =  L/<5  buckets  along  the  line,  and  so  for  n  indexes  and  assuming  that  the 
measured  invariants  have  a  constant  PDF  over  the  invariant  space^,  the  probability  of  hitting  any  cell  at 
random  is  1/6”.  If  there  are  A  models  in  the  library,  each  with  a  shape  descriptors,  and  each  invariant  can 
be  measured  up  to  an  error  of  i(56/2,  e  E  •hf  (the  set  of  natural  numbers),  there  will  be  aeA  entries  in  the 
table^®.  If  it  is  assumed  that  these  entries  are  spread  uniformly  over  the  hash  table,  the  chances  of  indexing 


®More  exactly  a  logarithmic  scale  should  be  used  as  the  errors  in  invari^lnt  indexes  tend  to  be  proportional  to  the  invariant 
values  [Forsyth91]. 

^This  claim  is  a  current  topic  of  research,  and  should  be  compared  to  the  work  of  Hopcroft,  ei  al.  [Mundy92]  and  May- 
bank  [Maybank93]. 

^^For  efficiency  reasons  during  recognition  only  a  single  cell  will  be  read.  Models  are  not  stored  in  single  cells,  but  in  as  many 
as  defined  by  the  range  6e  which  is  the  expected  measurement  error.  This  contrasts  with  storing  models  in  single  cells  and  then 
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a 


b 


Figure  37:  Both  models  1  (^1A%  support)  and  S  (Jb.l7o)  are  correctly  recognised  and  projected  into  the 
image  as  shown  in  (h). 


Figure  38:  A  demonstration  that  both  types  of  invariant  index  can  be  used  to  recognise  objects  in  a  single 
image  (by  applying  the  invariant  constructions  independently  within  LEWIS).  The  bracket  is  indexed  using 
algebraic  invariants  and  the  spanner  is  indexed  using  the  canonical  frame  signature. 
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Figure  39:  Although  the  index  space  is  multidimensional,  it  can  loosely  be  assumed  to  be  isotropic  in  all 
directions.  Considering  only  the  x  direction:  the  index  space  ranges  from  io  to  with  bucket  quantisation 

of  size  6.  On  measurement  of  an  index  i  in  a  scene,  all  the  buckets  within  a  range  must  be  searched 

in  the  index  space.  These  cells  are  shaded  grey. 

a  model  through  noise  is  (aeA)/6”. 

This  analysis  means  that  there  is  an  algorithmic  complexity  of  0{ki  +  where  fci  is  the  cost  of 

edge  detection,  feature  extraction  and  grouping  (essentially  constant),  and  k^  another  constant  dependent 
on  the  form  of  the  invariants,  etc.  It  can  be  seen  immediately  that  by  making  n  large,  the  term  dependent 
on  the  number  of  models  A,  becomes  arbitrarily  small,  and  so  recognition  time  tends  towards  ki,  a  constant. 

There  are  two  problems  associated  with  making  n  large: 

1.  For  algebraic  invariants  there  is  little  control  over  n.  If  a  minimal  feature  group  is  used  there  is  no 
control,  but  by  using  larger  structures  n  can  be  increased.  However,  the  grouping  task  may  then 
become  harder.  Alternatively  one  could  index  using  less  discriminatory  invariants  and  then  group 
using  results  of  this  first  indexing  stage  before  forming  higher  order  invariants  and  indexing  a  second 
time.  For  the  invariants  of  other  structures,  such  as  canonical  frame  invariants,  n  can  be  made  large 
(subject  to  the  noise  present  in  the  curve). 

2.  Making  n  too  large  means  that  the  problem  of  constructing  an  efficient  hashing  function  must  be 
considered. 

During  the  development  of  LEWIS  [Rothwell94]  it  was  found  that  an  invariant  composed  of  a  conic  and  two 
lines  gave  insufficient  discrimination  between  objects.  However,  as  an  example  of  the  above  argument,  when 

indexing  over  a  range. 
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an  extra  line  was  used  to  make  n  =  3  rather  than  n  =  1  the  invariant  increased  in  utility.  Because  of  the 
grouping  heuristics  used  in  the  system  there  was  no  loss  of  efficiency  but  rather  a  marked  improvement  in 
performance. 

3.5.2  Empirical  Assessment 

The  indexing  technique  computes  a  number,  of  invariants  that  is  entirely  dependent  on  the  number  of  image 
features,  though  only  a  few  of  these  will  be  turned  into  hypotheses  on  indexing.  Indexing  dramatically 
reduces  the  time  taken  for  the  entire  recognition  process.  It  was  argued  above  that  there  should  be  a  small 
linear  growth  in  the  number  of  hypotheses  created  as  the  size  of  the  model  base  grows. 

The  linear  growth  is  demonstrated  in  figure  40.  The  graph  shows  data  collected  over  fifty  evaluations 
of  the  recognition  system  in  which  a  single  model  from  the  model  base  was  placed  in  a  scene  and  partially 
occluded  by  other  objects  that  are  not  in  the  model  base.  Other  non-library  objects  were  also  placed  in  the 
scene  as  clutter;  figure  26  shows  a  typical  scene.  The  average  number  of  hypotheses  computed  as  more  objects 
were  added  to  the  library  is  plotted.  The  first  model  added  to  the  library  always  corresponded  to  the  actual 
model  in  the  scene.  Although  15.8%  of  the  hypotheses  were  for  the  correct  model  (this  is  for  when  a  total  of 
33  objects  were  present  in  the  library),  as  predicted  by  the  theory,  the  shape  of  the  graph  is  predominately 
linear.  The  real  benefit  of  indexing  becomes  apparent  when  one  considers  how  many  hypotheses  would  be 
produced  if  an  alignment  technique  were  used  (maintaining  the  same  grouping  methods).  On  average,  over 
2000  feature  groups  existed  for  each  image,  and  so  2000  hypotheses  would  be  produced  for  each  model  feature 
group  in  the  library  (generally  there  are  four  or  five  feature  groups  per  object  and  so  the  situation  would  be 
far  worse).  This  would  result  in  about  7  x  10^  hypotheses  for  the  entire  model  base  compared  to  less  than 
the  60  produced  when  indexing  is  used.  As  these  all  have  to  be  verified  it  is  clear  that  indexing  produces  a 
dramatic  improvement  in  the  system  efficiency. 

4  Discussion 

We  have  shown  how  the  use  of  invariants  as  index  functions  avoids  search  at  two  stages  of  the  recognition 
process.  First,  indexes  generate  hypotheses  which  give  direct  access  to  models,  avoiding  a  search  through 
the  model  library.  Second,  at  the  hypothesis  combination  stage,  invariants  of  the  geometric  relationships 
between  feature  groups,  for  instance  a  pair  of  M.  curves,  permit  the  efficient  construction  of  extended  feature 
groups. 


4.1  Verification 

The  final  stage  of  recognition  in  most  model-based  systems  [Huttenlocher87,  Lowe87]  is  to  verify  model- 
to-image  hypotheses.  In  the  system  described  here,  this  is  a  layered  process:  first  determine  if  there  is 
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Figure  40:  (a)  the  number  of  hypotheses  that  have  to  be  verified  varies  with  the  number  of  models  from  the 
model  base.  The  results  show  an  average  over  fifty  scenes  containing  only  one  object  in  the  model  base,  but 
with  other  clutter  and  occlusion  present.  Over  2000  indexes  are  created  for  the  scene,  which  corresponds  to 
the  number  of  hypotheses  that  would  have  to  be  verified  per  model  feature  group  if  an  alignment  paradigm 
were  used.  Therefore,  there  is  a  rapid  linear  growth  in  the  number  of  hypotheses  created  as  the  model  base  is 
expanded.  However,  the  number  of  hypotheses  created  through  indexing  remains  substantially  lower;  the  detail 
depicted  in  (b)  demonstrates  that  approximately  a  low  constant  of  proportionality  linear  growth  is  observed. 
This  ties  in  with  the  theoretical  prediction  of  section  3.5.1. 


a  common  projective  transformation  for  all  geometric  components  (lines,  conics,  M  curves)  of  the  joint 
hypotheses.  Second,  back  project  geometric  features  and  measure  image  support. 

This  strategy  can  fail,  generally  as  a  false  positive,  for  two  principal  reasons.  First,  only  projective  geo¬ 
metric  structure  is  used  and  many  object  boundary  shapes  are  equivalent  up  to  a  projective  transformation. 
In  order  to  discriminate  further,  it  is  necessary  to  assume  viewing  conditions  where  an  affine  transformation 
is  valid,  or  to  use  a  calibrated  camera  which  enables  scaled  Euclidean  reconstruction.  The  second  type  of 
failure  is  associated  with  incomplete  image  support,  which  is  discussed  in  more  detail  in  the  next  section. 
Examples  of  both  these  failure  cases  are  given  in  figures  42  and  43. 


4.1.1  Image  Support 

Hypothesis  validation  based  on  image  support  is  faced  with  two  opposite  failure  mechanisms:  too  little 
support;  or  too  much.  When  the  object  boundary  exhibits  little  image  contrast,  a  significant  fraction  of  the 
achievable  perimeter  is  unrecoverable  by  edge  detection  algorithms.  On  the  other  hand,  when  the  background 
is  highly  textured  or  cluttered,  high  support  can  be  achieved  for  an  incorrect  hypothesis.  Two  examples  of 
the  latter  failure  mechanism  is  shown  in  figure  45. 

Both  of  these  problems  are  symptomatic  of  having  too  sparse  a  description  for  the  object.  Thus  far  we 
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Figure  41:  Two  objects  from  the  model  base  are  recognised  correctly  despite  strong  perspective  distortion. 


Figure  42:  The  spanner  from  figure  27  is  shown  and  recognised^  but  with  the  wrong  orientation;  due  to  texture 
in  the  image  a  52.1%  edge  match  is  still  found. 


Figure  43:  An  object  from  the  model  base  which  is  superimposed  in  (b)  can  be  recognised  with  over  50%  edge 
support  from  the  specularity  on  the  pair  of  scissors  in  (a). 


Figure  44:  Stx  objects  were  recognised  from  image  (a).  The  four  correct  matches  are  shown  in  (b),  with  the 
two  incorrect  given  in  figure  j5.  The  worst  of  the  four  correct  identifications  had  two  invariants  and  60.3% 
match. 


Figure  45:  The  two  incorrectly  recognised  objects  from  the  image  in  figure  44(^)-  Ui^Uhe  the  matches  shown 
in  figure  44(b)  these  two  objects  were  hypothesised  by  only  a  single  invariant,  and  had  less  than  52%  image 
support.  In  both  cases  image  support  is  provided  by  features  that  have  already  been  used  to  verify  hypotheses 
in  figure  44(b).  The  straight  line  across  each  image  is  the  projection  of  the  line  at  infinity  from  the  acquisition 
images.  Its  closeness  to  the  image  center  indicates  an  unlikely  object  pose  (see  the  discussion  in  section  4.1.2 
for  details). 
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have  relied  just  on  the  boundary  curve  of  the  object  and  have  ignored  any  properties  of  the  interior  region.  It 
is  certainly  reasonable  to  use  our  knowledge  of  the  model  coordinate  frame  in  the  image  to  extract  viewpoint 
independent  texture  measures.  Even  very  simple  measures  would  have  eliminated  the  false  positive  shown 
in  figure  42.  It  is  also  possible  that  very  simple  intensity  measures  on  the  internal  object  surface  can  be 
used.  For  example,  the  ratio  of  intensities  in  the  neighbourhood  of  step  discontinuities  is  a  reliable  me^ure 
of  albedo  ratio  [Nayar93].  Even  very  weak  intensity  measures  for  discrimination  can  be  used  to  increase 
confidence  in  a  hypothesis  or  to  break  ties  between  two  very  similar  geometric  configurations. 

There  is  also  the  open  question  of  whether  a  feature  should  be  used  to  support  a  hypothesis  when  it  has 
already  been  used  in  a  confirmed  hypothesis,  as  illustrated  in  figure  45. 

When  insufficient  image  support  is  found  it  will  be  necessary  to  invoke  additional  understanding  of  scene 
of  the  form  “this  object  is  on  top  of  another,  and  therefore  occludes  it”.  An  understanding  of  this  kind  can 
be  used  to  guide  a  search  for  further  object  features  which  support  the  explanation  for  missing  features  in 
the  first  object. 

4.1.2  Projective  Transformation 

One  limitation  of  the  full  projective  transformation  is  that  unreasonable  perspective  projections  are  allowed. 
An  example  is  shown  in  figure  43  where  the  entire  object  is  backprojected  onto  a  thin  specularity.  To 
eliminate  such  projections,  we  propose  a  stratified  solution  involving  progressively  more  knowledge  of  affine 
followed  by  Euclidean  structure  and  ultimately,  full  perspective  camera  calibration. 

1.  Real  cameras 

For  a  physical  camera  the  object  must  lie  in  front  of  the  image  plane.  More  precisely  the  object  plane 
cannot  intersect  the  focal  plane  (a  plane  parallel  to  the  image  plane,  containing  the  optical  center). 
This  is  captured  by  projecting  the  ideal  line,  that  is  the  line  at  infinity,  of  the  model  image  onto  the 
target  image.  Any  case  in  which  the  ideal  line  passes  through  the  convex  hull  of  the  hypothesised 
image  features  can  be  ruled  out,  since  objects  are  assumed  to  be  finite.  More  generally,  if  the  model 
ideal  line  is  observed  within  the  finite  bounds  of  the  image  plane,  the  object  pose  must  be  sufficiently 
extreme  that  the  hypothesis  can  be  rejected.  In  our  experiments,  about  25%  of  false  positives  due  to 
poor  poses  ruled  out  by  this  constraint  (see  figure  45  and  [Rothwell94]  for  details). 

2.  Similarity  structure 

If  the  acquisition  view  is  taken  with  the  object  in  a  fronto-parallel  plane,  one  can  calculate  slant  and 
tilt  of  the  plane  of  the  object  in  the  target  image.  This  pose  calculation  does  not  require  full  calibration 
of  internal  camera  parameters.  In  the  perspective  case,  the  computation  does  not  require  focal  length 
and  in  the  affine  case  only  the  image  pixel  aspect  ratio  is  required. 

3.  Size 

Two  additional  calibration  parameters  are  essential  for  the  computation  of  size.  The  first  is  distance 
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from  the  camera.  In  order  to  estimate  this  distance,  an  approximate  knowledge  of  object  area,  a 
Euclidean  measure,  and  focal  length  are  required.  The  second  calibration  parameter  is  the  physical 
size  of  pixels  on  the  image  plane. 

4.2  Future  Work 

1.  For  non-algebraic  curves  there  are  other  invariants  available  which  do  not  require  A't  curves.  For  exam¬ 
ple,  Van  Gool  et  a!  [VanGool91]  exploit  single  inflections  as  distinguished  points;  Carlsson  [Carlsson92], 
fits  conics  tangent  to  the  curve  at  four  points.  The  latter  procedure  is  applicable  even  to  convex  curves. 
There  are  numerous  other  covariant  constructions  (for  example  tangents  between  two  curve  segments) 
that  can  be  utilised  to  generate  distinguished  points  and  hence  invariants.  The  natural  stage  for 
integrating  the  various  categories  of  invariant  is  at  hypotheses  combination.  Again  joint  invariants 
between  features  involved  in  more  global  invariant  groupings  can  efficiently  be  used  to  build  larger 
model  hypotheses.  This  integration  strategy  is  currently  under  investigation  [Rothwell93b]. 

2.  Feature  grouping  based  on  sequential  connectivity  is  a  somewhat  fragile  process.  It  is  easy  to  encounter 
large  gaps  in  the  object  boundary  due  to  low  image  contrast  and  occlusion.  Any  recognition  algorithm 
will  be  adversely  affected  by  these  effects,  however  it  is  impossible  to  recover  from  when  the  index  is 
constructed  based  solely  on  the  assumption  of  boundary  connectivity.  An  immediate  way  to  overcome 
this  problem  is  to  use  as  many  feature  groups  as  possible  for  a  given  object  to  derive  a  redundant 
description,  however  many  object  shapes  do  not  have  sufficient  complexity  to  define  more  than  a  few 
independent  feature  groups. 

Current  work  is  investigating  how  grouping  can  be  improved  for  applications  where  segmentation  pro¬ 
vides  poor  boundary  connectivity,  for  instance  in  aerial  reconnaissance  scenes.  The  primary  grouping 
relations  are  proximity  and  collinearity.  By  constructing  a  Delaunay  triangulation  of  the  set  of  line 
segment  endpoints,  it  is  computationally  feasible  to  establish  line  segment  sequences  which  are  not 
actually  connected  topologically.  Similarly,  line  segments  which  are  reasonably  close  and  collinear  can 
also  be  grouped  efficiently. 

3.  We  observe  that  a  useful  goal  for  image  feature  segmentation  and  grouping  is  to  provide  feature  groups 
which  support  invariant  computations,  for  example  the  algebraic  curves  and  M  curves  used  in  the  cur¬ 
rent  system.  As  a  consequence,  the  evolution  and  testing  of  new  segmentation  and  grouping  algorithms 
can  be  tested  by  an  evaluation  of  the  accuracy  and  stability  of  resulting  invariants.  Additionally,  the 
discovery  of  new  invariant  constructions  will  require  the  development  of  associated  feature  extraction 
algorithms.  Since  we  know  that  the  robustness  of  recognition  is  largely  dependent  on  the  success  of 
such  group  constructions,  we  can  profitably  focus  research  on  this  stage  of  the  system. 

4.  We  have  demonstrated  a  recognition  complexity  of  low  gradient  linear  growth  with  the  size  of  the 
model  base,  and  developed  a  statistical  model  of  this  performance.  These  results  are  still  preliminary, 
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firstly  because  the  model  base  is  still  relatively  small  (less  that  50  models),  and  secondly  because  the 
objects  are  fairly  similar.  It  is  an  open  question  as  to  whether  this  is  simply  clustering  behaviour  and 
if  for  a  large  model  base  (several  thousand  objects),  recognition  would  remain  asymptotically  constant  ^ 
time. 

5.  A  number  of  recent  papers  have  demonstrated  that  invariants  of  3D  structures,  under  3D  projective 
transformations,  can  be  extracted  from  image  projections  of  the  structure.  These  can  be  obtained  from 
multiple  views  [Demey92,  Faugeras92,  Hartley92,  Koenderink91,  Mundy92,  Quan91]  or  from  a  single 
view  [Forsyth92,  Forsyth93,  Liu93,  Rothwell93a,  Rothwell94,  Wayner91]. 

We  propose  to  employ  our  experiences  with  LEWIS  by  building  an  improved  recognition  system  called 
MORSE  (Multiple  Object  Recognition  by  Scene  Entailment)  for  3D  structures.  To  recognise  such 
structures  an  improved  architecture  is  required.  For  example:  first,  we  require  interactions  between 
different  types  of  invariant  working  simultaneously  in  a  single  image;  second,  fine-grained  commu¬ 
nication  loops  are  required  between  the  different  processing  layers  than  provided  within  the  current 
grouping-indexing-correspondence  architecture.  These  must  be  implemented  in  such  a  way  as  to  en¬ 
sure  that  the  implications  of  each  local  conclusion  are  understood  by  all  other  layers.  Furthermore, 
multiple  representations  of  objects  must  be  allowed  by  the  model  library.  For  instance,  curve  g  on  the 
spanner  in  figure  29  could  be  represented  both  as  a  concavity  curve,  and  as  a  five  line  sequence.  Both 
representations  should  be  included  in  the  model  library. 
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Abstract 

A  number  of  recent  papers  have  argued  that  invariants  of  three  dimensional  point  sets  in 
general  position  cannot  be  measured  from  a  single  image  [5,  7,  18,  28].  This  has  often 
been  misinterpreted  to  mean  that  invariants  cannot  be  computed  for  any  three  dimensional 
structure.  This  paper  proves  by  example,  that  invariants  can  be  measured  for  constrained 
three  dimensional  point  sets  from  a  single  image. 

Projective  invariants  are  derived  for  two  classes  of  object:  the  first  is  for  points  that  lie  on 
the  vertices  of  polyhedra,  and  the  second  for  objects  possessing  a  bilateral  symmetry.  These 
invariants  can  be  used  both  for  recognition  and  to  compute  projective  structure.  Examples 
are  given  of  invariants  computed  from  real  scenes  imaged  with  an  uncalibrated  camera. 
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1  Introduction 


Recognizing  7>/anar  objects  from  a  single,  uncalibrated,  perspective  image  obtained  from 
an  unknown  viewing  position  is  a  problem  that  naturally  suggests  projective  invariants; 
the  transformation  from  object  to  image  is  an  unknown  projective  transformation,  and 
the  use  of  invariants  eliminates  the  effect  of  that  transformation.  Projective  invariants  have 
consequently  had  a  substantial  impact  on  model  based  object  recognition  by  providing  useful 
examples  of  indexing  functions,  which  facilitate  rapid,  viewpoint  invariant,  access  to  models 
in  model  libraries.  Indeed,  a  plane  object  recognition  system  using  only  projective  invariants 
as  indexes  has  been  constructed  [34,  35,  37].  Recognition  has  been  demonstrated  for  a  model 
library  of  over  forty  objects,  with  the  system  tested  on  hundreds  of  images,  under  varying 
levels  of  occlusion  and  clutter. 

However,  until  quite  recently  there  was  a  notable  absence  of  index  functions  for  three 
dimensional  ohiecis.  This  is  primarily  because  most  research  on  the  application  of  invariance 
to  computer  vision  (e.g.  [1,  12,  20,  27,  41,  43,  45,  47,  46])  has  concentrated  on  planar 
objects  from  a  single  view,  and  three  dimensional  objects  from  two  views  or  range  data. 
Furthermore,  a  number  of  recent  papers  have  argued  that  invariants  can  not  be  measured 
for  a  3D  set  of  points  in  general  position  from  a  single  view  [5,  7,  18,  28].  This  has  often  been 
misinterpreted  to  mean  that  no  invariants  can  be  measured  for  three  dimensional  objects 
from  a  single  image,  and  consequently  that  the  use  of  invariants  as  indexes  is  confined  to 
planar  objects.  In  contrast,  we  demonstrate  that  if  points  are  not  in  general  position,  but 
have  constrained  3D  positions,  then  invariants  are  available  for  use  as  index  functions. 

In  this  paper  we  describe  two  examples  of  constrained  (or  structured)  3D  point  sets.  In 
each  case  the  assumed  constraint  can  be  verified  to  a  certain  extent  from  image  measure- 
ments.  The  two  cases  are: 

1.  Polyhedral  cages:  The  points  lie  at  the  vertices  of  a  virtual  polyhedron  or  are 
the  vertices  of  a  real  polyhedron.  Allowing  virtual  polyhedra  means  that  the  objects 
we  recognize  need  not  necessarily  be  polyhedral.  The  polyhedral  class  we  examine 
have  only  trihedral  vertices,  and  the  points  lie  on  shapes  described  by  six  planes.  A 
minimum  of  7  points  are  required  in  order  to  recover  invariants.  This  is  described  in 
Section  2.  In  general  each  plane  of  the  polyhedron  contains  only  four  points.  This 
demonstrates  that  the  invariants  extracted  are  of  the  3D  object,  since  five  planar  points 
are  required  for  a  projective  invariant. 

2.  Bilateral  symmetry:  The  point  set  has  a  bilateral  symmetry.  A  minimum  of  eight 
distinct  points  (four  symmetrically  matched  pairs)  are  required.  We  demonstrate  in 
Section  3  that  the  full  3D  projective  structure  can  be  recovered  from  a  single  uncal¬ 
ibrated  perspective  image.  This  reconstruction  method  should  be  compared  to  the 
ones  presented  in  [14,  26]  which  exploit  symmetry  in  a  similar  manner  to  determine 
shape  under  perspectivity,  but  do  so  with  a  calibrated  camera. 

As  in  the  planar  case,  index  functions  based  on  invariants  can  be  used  in  a  recognition  system 
to  provide  direct  access  to  a  model  library,  irrespective  of  the  object  pose  and  intrinsic  camera 
parameters. 

Even  in  the  case  of  general  point  sets,  indexing  strategies  are  possible,  though  they  are 
less  efficient.  For  example,  under  weak  perspective,  indexing  functions  can  be  defined  that 
return  values  restricted  to  a  line,  rather  than  a  point  [7,  19].  The  modelbase  therefore 
becomes  a  space  filled,  with  lines  representing  each  model,  where  the  lines  have  a  finite 
thickness  to  allow  for  measurement  error.  In  contrast,  in  the  invariant  indexing  case  each 
model  is  represented  by  points  and  an  associated  error  ball.  Clearly  the  false-positive  rate 
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of  the  indexing  stage  (that  an  arbitrary  value  should  by  chance  index  a  model)  is  necessarily 
higher  when  the  space  is  filled  with  lines. 

In  the  following  we  adopt  the  notation  that  corresponding  entities  in  two  different  coor¬ 
dinate-frames  are  distinguished  by  upper  and  lower  case.  In  general,  lower  case  is  used  for 
image  quantities,  and  upper  for  3D  quantities.  Vectors  are  written  in  bold  font,  for  instance 
X  and  X.  Matrices  are  written  in  typewriter  font,  for  example  c  and  C.  With  homogeneous 
quantities,  equality  is  up  to  a  non-zero  scale  factor. 

1.1  Geometric  invariants 

Invariants  are  properties  of  geometric  configurations  which  remain  unchanged  under  an 
appropriate  class  of  transformations  [30].  Under  a  linear  transformation  of  coordinates, 
X'  =  TX,  the  invariant,  /(P),  of  a  configuration  P  transforms  as 

/(P0  =  iTr/(p), 

and  is  called  a  relative  invariant  of  weight  ly,  where  P'  is  the  transformed  configuration.  If 
w  =  0,  the  invariant  is  unchanged  under  transformations  and  is  called  a  scalar  invariant. 
We  will  only  be  interested  in  scalar  invariants  in  this  paper. 

Here  we  will  be  concerned  with  projective  transformations,  so  T  is  a  general  non-singular 
square  matrix  acting  on  homogeneous  coordinates.  For  planar  configurations  it  is  3  x  3,  and 
for  3D  configurations  4x4.  Note,  invariants  are  computed  with  respect  to  a  transformation, 
which  is  a  mapping  between  spaces  of  the  same  dimension. 

We  are  interested  here  in  measuring  invariants  of  3D  point  sets  from  a  perspective  pro¬ 
jection  of  the  configuration.  For  3D  objects  the  original  and  image  spaces  do  not  have  the 
same  dimension.  However,  invariants  of  the  3D  configuration  can  be  measured  from  their 
image  projections  for  the  cases  discussed  in  this  paper.  We  write  P  for  the  projection  matrix 
that  covers  a  3D  Euclidean  transformation  of  the  object  followed  by  perspective  projection 
onto  the  image.  P  is  a  3  x  4  matrix  mapping  three-dimensional  homogeneous  coordinates 
X,  onto  two-dimensional  homogeneous  coordinates  of  the  image  plane  x: 

x  =  PX.  (1) 

1.2  The  limitations  of  point  sets 

A  set  of  3D  points  is  a  very  impoverished  representation  of  shape.  It  contains  the  mini¬ 
mum  description  possible  (position)  but  no  connectivity,  and  lacks  the  richness  of  curves 
or  surfaces.  Furthermore,  unrestricted  point  sets  are  not  an  important  part  of  the  visual 
experience.  Far  more  significant  are  polyhedra,  faceted  surfaces,  and  smooth  surfaces. 

It  is  seldom  the  case  that  machine  recognition  systems  must  deal  with  wholly  generic 
objects;  more  typically,  objects  are  drawn  from  classes,  which  satisfy  geometric  constraints. 
For  example,  objects  might  be  algebraic  surfaces,  generalised  cylinders,  polyhedral  or  sym¬ 
metric.  Constraining  objects  to  belong  to  classes  very  often  results  in  3D  invariants  that  can 
be  measured  from  a  single  image.  Some  constraints  are  ineffective;  for  example,  constrain¬ 
ing  an  object  to  be  a  smooth  surface  does  not  yield  global  3D  invariants  from  the  outline. 
However,  a  number  of  quite  reasonable  constraints  result  in  invariants  that  can  be  measured 
in  a  single  view.  Known  examples  include  algebraic  surfaces  [13],  rotationally  symmetric 
surfaces  [23];  and  canal  surfaces  [32].  This  paper  adds  polyhedral  objects  and  objects  with 
a  symmetry  to  this  list. 

In  addition,  the  images  of  constrained  three  dimensional  objects  often  have  natural  group¬ 
ing  mechanisms  which  are  not  available  for  general  3D  points  sets.  For  example,  polyhedral 
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grouping  involves  constructing  a  line  drawing,  and  grouping  for  images  of  objects  with  bi¬ 
lateral  symmetry  involves  a  simple  line  search  (see  Section  3.1). 


1.3  “Invariant  computation 

Recovering  geometric  invariants  of  3D  structures  from  images  can  proceed  in  two,  quite 
distinct,  ways  [15]:  these  are  explicit  reconstruction  and  implicit  reconstruction.  These 
approaches  are  mathematically  equivalent,  in  that  they  measure  the  same  numbers,  but 
are  conceptually  distinct,  and  may  have  very  different  noise  properties.  To  appreciate  the 
difference,  consider  a  stereo  pair  taken  with  uncalibrated  cameras,  depicting  a  set  of  matched 
points.  Given  the  matches,  it  is  possible  to  reconstruct  in  space  the  collection  of  points,  and 
then  to  measure  its  3D  projective  invariants  from  that  reconstruction.  This  is  an  explicit 
reconstruction.  An  alternative  approach  would  involve  writing  the  3D  invariants  as  functions 
of  the  coordinates  in  each  separate  image,  and  evaluating  those  functions  directly;  this  is 
called  implicit  reconstruction.  Each  approach  has  its  merits  and  its  difficulties  in  terms  of 
ease  of  application  and  sensitivity  to  measurement  error.  This  paper  emphasizes  explicit 
reconstructions  of  polyhedra,  but  shows  an  implicit  construction  in  Section  2.6.2;  in  the 
case  of  bilaterally  symmetric  objects,  an  implicit  construction  appears  in  Sections  3,2  and 
an  explicit  construction  appears  in  Section  3.3. 


2  Polyhedral  cages 

3D  object  vertices  often  lie  on  planes,  which  can  be  grouped  to  form  a  “virtual  polyhedron” 
—  see  the  examples  at  the  end  of  this  section,  for  example  Fig.  3f.  It  may  be  that  the 
object  is  actually  polyhedral,  but  this  is  not  necessary.  The  virtual  polyhedron  “cages” 
the  3D  points.  The  route  we  take  to  forming  the  projective  invariants  of  the  3D  points, 
is  by  reconstructing  the  caging  polyhedron  in  3-space,  and  then  measuring  the  projective 
invariants  for  the  planes  of  the  polyhedron.  The  following  sections  outline  the  extraction  of 
3D  projective  invariants  for  polyhedra  from  a  single  perspective  image. 


2.1  Reconstruction  of  polyhedra  from  single  images 

The  reconstruction  of  polyhedra  has  roots  in  the  origins  of  the  field  with  the  seminal  work  of 
Huffman  [17]  and  independently  by  Clowes  [8]  on  labeling  schemes  which  allow  qualitative 
interpretation  of  polyhedral  scenes.  This  work  was  continued  and  advanced  by  Waltz  [44] 
and  Mackworth  [25],  yielding  techniques  that  can  determine  the  type  of  a  polyhedral  edge 
(concave  or  convex)  and  can  parse  junctions  and  shadow  regions. 

Sugihara  [40]  reframed  reconstructing  polyhedra  from  single  images  as  an  algebraic  prob¬ 
lem,  and  showed  how  to  produce  a  parameterised  3D  reconstruction  from  a  single  ortho¬ 
graphic  or  (calibrated)  perspective  image.  However,  to  reduce  the  ambiguity  of  the  3D 
reconstruction  to  scaled  Euclidean,  the  geometric  constraints  were  augmented  with  texture 
or  shading  information.  Our  contribution  in  this  section  is  to  demonstrate  that  the  structure 
recovered  from  the  algebraic  formulation  for  a  perspective  image  is  within  a  3D  projective 
transformation  of  the  actual  Euclidean  structure.  Consequently,  projective  invariants  of  the 
recovered  polyhedron  have  the  same  value  as  those  of  the  actual  polyhedron  giving  rise  to 
the  image. 

For  general  polyhedra,  viewing  issues  become  important;  in  particular,  self  occlusion  can 
lead  to  situations  where  the  projective  structure  of  the  polyhedron  can  be  recovered  only  in 
part.  This  problem  bears  some  relationship  to  the  aspect  graph  of  the  polyhedron  (clearly. 
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views  that  recover  different  parts  of  the  projective  structure  lie  in  different  cells  of  the  aspect 
graph),  but  has  some  crucial  differences;  the  most  significant  is  the  fact  that  faces  can  be 
inferred  by  assuming  trihedral  junctions  (an  assumption  used  throughout  this  paper),  so 
that-occluded  faces  may  appear  in  the  solution.  For  example,  all  faces  of  a  cube  can  be 
correctly  recovered,  even  though  at  most  only  three  are  visible  in  any  one  view,  as  at  each 
vertex  that  lies  on  the  silhouette,  the  vertex  and  the  two  edges  define  the  occluded  plane, 
and  the  three  occluded  planes  define  the  occluded  vertex.  For  more  complex  polyhedra, 
complete  reconstructions  are  not  always  possible;  the  problem  is  particularly  severe  when 
there  are  views  such  that  for  some  faces,  no  element  (edge  or  vertex)  incident  on  those  faces 
are  visible. 

2.2  Forming  the  constraint  equations 

In  the  derivation  that  follows,  all  polyhedron  vertices  are  assumed  to  be  trihedral.  Trihedral 
junctions  are  stable  because  three  planes  generically  meet  in  a  single  point;  any  one  of  the 
planes  can  be  perturbed  resulting  in  a  single  vertex  close  to  the  original  vertex.  This  is 
not  true  for  higher  order  junctions.  This  assumption  has  little  effect  on  the  mathematics 
presented  here,  the  difference  being  that  if  a  point  lies  on  n  (rather  than  3)  planes  n  —  1 
constraints  can  be  formed  rather  than  the  2  for  the  trihedral  case. 

A  further  assumption  is  that  the  polyhedra  represent  real  solid  objects,  that  is,  a  polyhe¬ 
dron  is  not  just  a  polygon  or  a  group  of  polygons  embedded  in  3-space.  Finally,  a  genericity 
assumption  will  be  necessary;  the  details  of  this  assumption  will  be  developed  along  with 
the  mathematics  below. 

The  derivation  of  our  method  is  similar  to  that  of  Sugihara  except  that  a  perspective 
imaging  model  is  used,  and  that  the  camera  is  uncalibrated.  The  input  is  a  set  of  unknown 
planes  that  are  bounded  by  an  arbitrary  number  of  edges  (>  3),  and  a  set  of  points  (projected 
vertices)  whose  image  locations  are  known.  Using  the  trihedral  assumption  the  points  always 
lie  on  three  given  planes. 

Without  loss  of  generality,  a  coordinate  frame  is  chosen  so  that  the  world  X  and  Y  axes 
lie  in  a  plane  parallel  to  the  image  plane  with  directions  shown  in  Fig.  1;  the  origin  is  at  the 
camera  optical  centre.  The  Z  axis  lies  along  the  principal  axis  of  the  camera,  and,  again 
without  loss  of  generality,  the  image  plane  is  the  plane  Z  =  1.  The  differences  between 
the  actual  camera  coordinates  and  this  coordinate  system  will  be  taken  up  in  the  projective 
ambiguity  below. 

The  plane  in  the  polyhedron  has  equation: 


ajX^bjY  +  CjZ+l  =  0.  (2) 

Note  that  this  assumes  that  no  plane  in  the  polyhedron  passes  through  the  optical  centre, 
that  is,  the  optical  centre  is  in  general  position.  The  plane  can  then  be  represented  as 
Vj  =  (aj,6y,Cj,  1)^  in  homogeneous  coordinates  in  projective  3-space,  and  j  G  n} 

where  n  is  the  number  of  planes  on  the  object  that  are  either  visible  or  can  be  inferred. 

The  vertices  of  the  polyhedron  are  (X,-,yi,^t)5  under  a  pinhole  projection  model,  pro¬ 
jection  onto  the  plane  Z  =  I  maps  the  point  (Xi,Yi^  Zi)  to  (a^j,  t/i)  where: 


A,-  . 

Xj  =  —  and  yi  =  — - 

Now,  in  3D  at  vertex' z,  there  are  three  incidence  conditions,  each  of  which  take  the  form: 

djXx  bjYi  ‘jr  Cj Zi  ^  1  =  0, 
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Figure  1:  The  camera  model  used  with  the  world  and  camera  coordinate  frames  shown.  A 
perspective  imaging  model  is  used  to  project  the  world  point  X  to  the  camera  plane  point  x. 
Note  that  image  coordinates  only  can  be  measured,  rather  than  the  camera  coordinates  of  an 
image  point  (these  two  frames  are  linked  by  an  affine  map). 


and  exist  because  three  planes  are  incident  on  each  vertex.  At  the  i^^  vertex,  dividing  the 
incidence  conditions  by  Zi  setting  U  =  l/Zi  gives: 

ajXi  -h  bjyi  +  Cj  -f  U  =  0.  (3) 

Note  that  this  equation  is  linear  in  the  unknowns  {oj, 

From  the  trihedral  assumption  it  is  known  that  each  image  point  lies  on  three  planes, 
say  j  6  {p,  ?)?'}•  Since  the  U  cannot  be  measured,  they  must  be  eliminated  between  pairs 
of  equations  (one  equation  for  each  of  the  three  planes  at  the  vertex),  yielding  two  linear 
equations  at  each  observed  image  vertex: 

(flp  “h  (6p  ^r)y*  d*  (Cp  C,»)  —  0, 

{aq  -  ar)Xi  -h  {bq  -  br)yi  +  {Cq  -  Cr)  =  0.  (4) 

For  n  observed  or  inferred  object  planes  and  m  observed  image  vertices,  there  are  2m 
equations  (which  may  or  may  not  be  independent)  in  3n  unknowns: 


A(x',yO  w'  =  0,  (5) 

where  x  =  (xi, . . . ,  XmV  y  =  (yi j  *  •  •  ^  ymV  ^-re  the  observed  coordinates  of  the  image 
vertices,  A(x,y)  is  a  2m  x  3n  array  of  constraints  (2  constraints  for  each  visible  point), 
and  w  =  (ai,6i,ci,a2', . .  .,c„)^.  The  solution  space  of  this  system  is  the  null  space  or 
kernel  of  the  matrix  A,  which  represents  the  set  of  polyhedra  that  can  be  reconstructed  from 
the  image.  The  dimension  of  the  kernel  is  d  >  3n  —  2m.  The  arises  because  the  set 
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of  equations  A  may  contain  linear  dependencies  and  so  the  polyhedra  is  not  position  free, 
otherwise  we  need  consider  only  an  sign.  The  position  free  property  is  described  in 
more  detail  by  Sugihara  [40]  and  in  Section  2.3.  In  all  cases,  the  dimension  of  the  kernel 
immediately  gives  the  number  of  degrees  of  freedom  of  the  family  of  polyhedra  that  satisfy 
the  image  constraints,  and  d  is  exactly  the  number  of  parameters  required  to  define  the 
solution  set. 

2.3  Interpreting  the  kernel 

The  solution  of  eqn  (5)  can  be  written  w'  =  where  the  kernel  of  A  is  spanned 

by  the  vectors  bj,  i  G  {1, . . d}.  The  sub-space  spanned  by  bj  is  a  d  dimensional  subspace 
of  the  3n  dimensional  polyhedral  space. 

This  sub-space  includes  reconstructions  where  all  vertices  are  coplanar,  and  all  planes 
are  the  same  plane.  For  example,  the  physical  image  plane  is  just  such  a  reconstruction. 
There  is  a  three- parameter  family  of  planar  reconstructions  which  can  be  written  as: 

w'  =  {u,v,w,u,v,w,u,v,w,  .  (6) 

The  three  parameters  of  the  family  are  u,  and  w,  each  choice  corresponding  to  a  single 
plane  as  solution.  It  is  necessary  to  avoid  such  degenerate  reconstruction  in  order  to  compute 
true  3D  reconstructions. 

Planar  reconstructions  can  be  isolated  and  excluded  from  consideration  by  linearly  trans¬ 
forming  the  basis  such  that  b^  b,*,  i  E  {1, 2, 3}.  where 


bi 

=  (1.0, 0,1,., 

...1,0, of: 

b2 

=  (0,1, 0,0,., 

,.,0,1, of 

ba 

=  (0, 0,1,0,., 

..,0,0,  if 

The  kernel  always  contains  the  vectors  b,-,  i  G  {1,  • . . ,  3}  since  they  span  the  planar  solution 
in  eqn  (6).  This  transformation  generates  a  new  basis  b,-,  i  G  d},  and  provided 

only  solutions  w  =  considered,  the  degenerate  planar  “reconstructions^  are 

excised. 

In  the  sequel  we  concentrate  on  polyhedral  recovery  when  d  =  4;d>4isa  topic  for 
future  work,  as  the  relationship  between  the  space  of  solutions  and  the  geometry  of  the 
polyhedron  is  more  complex.  If  the  kernel  is  three  dimensional,  then  it  can  contain  only 
the  span  of  the  first  three  three  vectors,  and  so  the  object  must  be  planar.  In  ail  cases,  the 
kernel  must  have  dimension  at  least  three. 

2.3.1  Position  free  polyhedra 

Under  image  noise  the  dimension  of  the  kernel  does  not  give  an  indication  of  whether  an 
object  is  position  free.  Algebraically,  an  object  is  position  free  if  the  rank  of  A  does  not 
change  as  we  move  from  the  exact  imaging  case  to  the  one  in  which  we  have  measurement 
error.  The  reason  for  the  change  in  rank  is  because  constraints  that  are  ideally  dependent 
within  A  cease  to  be  as  coefficients  are  perturbed. 

An  example  of  an  object  that  is  not  position  free  is  the  truncated  tetrahedron  shown 
in  Fig.  2,  where  for  minor  perturbations  of  the  vertices  the  only  possible  reconstruction  of 
the  polyhedron  is  as  a  plane  figure.  The  incidence  relations  embedded  in  the  tetrahedron 
cannot  normally  be  observed  from  properties  of  measured  points  or  of  a  measured  kernel, 
as  measurement  error  destroys  the  relation.  As  a  result,  for  non-position  free  objects  the 


8 


Figure  2:  (a)  shows  a  plan  view  of  a  truncated  tetrahedron  which  is  not  position  free  but  can 
be  realised  as  a  3D  object.  A  different  view  of  the  same  figure  is  shown  in  (c).  If  the  point  v 
is  perturbed,  the  point  x  defined  by  the  intersection  of  three  lines  in  a  is  no  longer  defined, 
and  so  the  polyhedron  can  be  interpreted  only  as  a  plane  figure  (such  as  the  one  lying  on 
this  page).  In  real  images  all  vertices  are  perturbed  by  noise,  and  so  for  a  non-position  free 
object  such  as  a  truncated  tetrahedron  one  can  never  recover  a  full  3D  interpretation. 


theory  given  in  this  paper  does  not  provide  a  genuine  reconstruction  of  the  object.  If  useful 
information  is  to  be  extracted  from  the  images  of  non-position  free  objects  one  must  be 
able  to  spot  the  degeneracy  in  some  other  way  (a  similar  observation  was  made  by  Sparr 
in  [38]).  The  degeneracy  in  A  is  hard  to  observe  through  measurements  on  the  matrix,  and 
so  one  must  determine  the  appropriate  incidence  constraint  from  other  properties  of  the 
outline.^  Currently  techniques  based  on  observing  cycles  in  a  planar  graph  determined  from 
the  outline  are  being  investigated  [36].  Commonly,  but  not  always,  polyhedra  that  are  not 
position  free  lead  to  kernels  of  dimension  three,  allowing  only  plane  interpretations. 

For  a  perspective  view  of  a  position  free  polyhedron,  where  matrix  A  has  kernel  dimension 
four,  distinct  generic  elements  of  the  kernel  represent  projectively  equivalent  configurations 
of  planes.  Furthermore,  a  generic  element  of  the  kernel  represents  a  configuration  of  planes 
which  is  projectively  equivalent  to  some  subset  of  the  planes  of  the  original  polyhedron  (or 
the  whole  polyhedron).  Projective  equivalence  means  that  the  projective  invariants  of  a 
generic  element  of  the  kernel  will  be  equal  to  the  projective  invariants  of  the  actual  object. 
The  full  proof  of  this  statement  is  given  in  Appendix  A,  here  we  describe  why  the  solution 
is  invariant  to  camera  calibration.  Essentially,  re-calibration  causes  an  affine  action  on  the 
image  plane  which  may  be  alternatively  thought  of  as  an  aflSne  action  on  space  prior  to 
imaging.  The  camera  matrix  in  eqn  (1)  may  be  expanded  as: 

^The  natur£il  algorithm,  noting  small  eigenvalues  in  the  singular  value  decomposition  of  A,  is  imattractive 
because  it  is  hard  to  determine  what  threshold  is  appropriate  to  describe  ‘small*. 
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P=A  [I|0]G, 

where  A  is  the  aflSne  calibration  matrix,  I  the  3x3  identity  matrix,  and  G  a  Euclidean  action 
in  3-space.  “Commuting”  through  matrix  A  gives 

p  =  [iioi  [5  J  ]  “• 

=  [I|0]A4G. 

Hence,  camera  calibration  in  the  image  is  equivalent  to  equivalent  to  an  affine  action  on 
3-space.  Consequently,  provided  the  measurements  in  3-space  are  affine  (or  in  our  case 
projective),  we  will  have  invariance  to  camera  calibration. 

2.4  Indexing  for  polyhedra 

Projective  invariants  are  required  to  index  an  object  model  based  on  these  reconstructions; 
projective  invariants  for  systems  of  planes  are  relatively  easily  formed.  For  a  system  of  n 
planes  in  general  position,  there  are  3n  -  15  independent  invariants  (using  the  counting 
argument  that  appears  in  [12]).  For  example,  given  6  planes,  a  possible  set  of  invariants  is: 

_  l^356lMl3542|  t  _  |l3562Ml3142|  j  _  l^3564Ml5612|  /yx 

|l3564M^3512r  II3512MI3642I  |l356lMl5642| 

where  labcd  =  [va,  v^,  v^,  v^],  and  the  vectors  v,-  are  the  homogeneous  representations  of  the 
planes.  Invariants  to  linear  groups  typically  take  the  form  of  ratios  of  determinants,  see  [30] 
for  proof  of  their  invariance. 

We  can  therefore  compute  invariants  from  a  single  image  of  a  polyhedron  by  computing 
the  determinant  functions  for  any  of  the  possible  reconstructions  (here  we  are  making  use 
of  the  projective  equivalence  of  all  of  the  reconstruction  solutions).  Rather  than  choosing 
an  arbitrary  solution  (which  may  be  degenerate),  we  base  the  solution  on  the  fourth  basis 
vector  of  the  kernel,  which  is  composed  of  a  set  of  three-vectors  mj,  one  for  each  plane: 

b4  =  (mf,...,mj)^. 

Each  vector  representing  the  homogeneous  coordinates  of  a  plane  is  given  as  v,*  =  {mf ,  1)"^. 

Note  that  when  the  four  planes  {a,  6,  c,  d}  forming  labcd  become  concurrent  (for  example, 
a  cycle  of  faces  on  a  cuboid),  the  values  of  the  invariants  can  become  either  infinite,  undefined 
or  zero  because  \Iabcd\  =  0.  Without  a  priori  knowledge  of  the  configuration  we  would  still 
compute  the  invariants  and  index  with  them,  and  so  such  a  degenerate  configuration  could 
be  troublesome.  Certainly,  under  the  error-free  case  we  should  still  recover  representative 
invariants,  but  with  errors  in  the  point  locations  it  becomes  impossible  to  quantify  near 
zero’  values;  this  is  because  under  a  projectivity^  any  determinant  can  be  made  arbitrarily 
large.  An  interesting  current  research  issue  is  whether  the  determinant  values  really  vary 
by  large  amounts  for  typical  viewing  conditions,  or  not. 

2.5  Examples 

In  the  following  examples  of  caging,  all  the  virtual  polyhedra  have  at  least  four  identifiably 
coplanar  points.  Configurations  having  at  most  three  points  on  any  plane,  and  no  other 
constraints,  suffer  from  the  nil-invariance  result  of  [5,  7,  18,  28]. 
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h 

h 

h 

view  a 

i  1  ■ 

0.9862 

view  b 

0.9657 

view  c 

0.9796 

view  d 

BIS  1  SSI 

0.9619 

-0.9247 

view  e 

-0.0012 

0.9988 

-1.0043 

mean 

Biiisiyi 

(T 

0.0482 

cr  (%) 

- 

1.4 

4.8 

Table  1:  The  invariants  computed  for  the  views  in  Fig,  3^  with  their  mean  and  standard 
deviations.  In  the  last  row  the  standard  deviation  is  shown  as  a  percentage  of  the  mean  for 
I2  and  Is.  The  value  is  not  computed  for  Ii  because  in  this  case  the  mean  is  ideally  zero.  In 
fact,  the  cage  on  the  punch  is  projectively  equivalent  to  a  cube  and  so  the  invariants  should 
be  {h.h.h}  =  {0,1,1}.  From  these  values  it  is  also  evident  that  the  invariants  remain 
constant  over  a  change  in  viewpoint  (see  the  graph  in  Fig.  4). 

In  Fig.  3  a  series  of  five  images  of  a  punch  is  shown  with  the  same  seven  points  marked 
on  each  image.  These  points  are  the  images  of  vertices  of  the  virtual  polyhedron  for  which 
the  invariants  are  computed.  The  images  are  processed  using  a  Canny  [6]  edge  detector, 
and  straight  lines  fitted  to  the  edge  data  using  an  orthogonal  regression  routine.  Vertex 
positions  are  found  by  intersecting  pairs  of  lines  (by  hand)  and  the  points  are  then  grouped 
into  planes  to  form  a  six  sided  polygon.  The  invariants  for  (a),  (b)  and  (d)  should  be  equal, 
and  those  for  (c)  and  (e)  the  same,  though,  not  necessarily  equal  to  those  of  the  three  former 
views.  This  is  because  a  different  set  of  points  were  used  to  compute  the  invariants  for  (c) 
and  (e).  However,  because  of  the  reflectional  symmetry  of  the  object  the  invariants  are  equal 
for  all  five  views^. 

The  invariants  measured  for  the  five  views  are  given  in  Table  1,  and  are  essentially 
constant  over  the  change  in  viewpoint.  In  the  table  the  standard  deviations  are  given,  as 
well  as  the  value  of  the  standard  deviation  as  a  percentage  of  the  mean  invariant  value 
(except  for  the  first  invariant  which  is  meant  to  be  zero,  as  four  of  the  planes  intersect  in  a 
single  point,  and  so  the  planes  are  not  algebraically  independent),  and  these  are  below  5%. 

In  Fig.  5  a  second  example  is  shown  with  two  views  of  a  calibration  table.  The  measured 
invariants  are  given  in  Table  2;  they  remain  constant  over  the  change  in  viewpoint,  but 
are  different  from  the  invariants  computed  for  the  punch  in  Fig.  3.  This  means  that  the 
invariants  are  both  stable  and  give  discrimination  between  different  objects.  Note  that 
different  invariants  provide  differing  amounts  of  discrimination  between  objects;  for  example 
between  the  punch  and  the  calibration  table  Is  provides  the  only  discrimination. 

2.6  Verifying  the  polyhedral  assumption 

So  far  only  the  projective  structure  has  been  computed  for  the  point  set  represented  by  the 
seven  visible  points  in  the  images.  This  ignores  other  image  information  that  can  be  used  to 
confirm  the  hypothesis  that  a  caged  polyhedron  is  really  being  observed.  Assuming  that  the 
object  is  a  caged  polyhedron  allows  a  hypothesis  on  object  identity,  which  in  turn  predicts 
the  positions  of  other  points  in  the  image  (in  this  case  the  eighth  point  on  the  caging 
polyhedron).  If  these  other  points  are  visible,  the  predicted  and  actual  image  positions 

^The  symmetry  is  equivalent  to  a  projective  automorphism  on  the  object  and  so  will  not  ciffect  the 
meeisured  invariants. 
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Figure  3:  ((i)“(6)  shows  five  views  of  a,  'punch  and  the  points  used  to  compute  the  invariants. 
These  points  are  caged  by  a  polyhedron  whose  bounding  planes  are  not  actual  object  planes; 
the  caging  polyhedron  for  (e)  is  shown  in  (f).  The  computed  invariants  are  given  in  Table  1. 


12 


Figure  4:  The  measured  invariants  for  Fig.  3  and  Table  1  remain  unchanged  over  the  change 
in  viewpoint. 


Figure  5:  Two  views  of  the  calibration  table  used  to  test  the  invariants.  The  seven  points 
used  to  compute  the  invariants  are  marked  in  white.  In  the  right  image  the  eighth  point  is 
also  visible;  this  could  be  used  to  overconstrain  the  solution,  but  in  this  case  it  is  ignored. 
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h 

h 

Is 

view  1 
view  2 

-0.00146 

0.00117 

0.991 

1.007 

-6.30 

-6.44 

Table  2:  The  invariants  measured  for  the  two  views  of  the  calibration  table  in  Fig,  5.  The 
invariants  stay  fairly  constant  even  under  image  noise.  In  this  case,  Is  can  be  used  to 
discriminate  between  the  calibration  table  and  the  punch  in  Fig.  3;  again  {Ii^h}  —  {Ojl} 
for  the  calibration  table. 


provide  an  independent  check  on  the  validity  of  the  caging  assumption.  The  positions  of  the 
remaining  points  can  be  determined  either  algebraically  or  geometrically,  as  shown  for  a  six 
sided  figure  in  the  following  sections. 

2.6.1  Algebraic  approach 

The  eighth  point  is  obtained  by  intersecting  the  three  virtual  planes  that  bound  the  eighth 
vertex.  It  can  be  shown  algebraically  that  the  locus  of  the  eighth  point  in  three  dimensions 
as  different  solutions  are  chosen  for  the  reconstruction,  is  a  line  in  space  that  passes  through 
the  camera  centre.  As  this  line  projects  to  a  point  in  the  image,  we  see  that  the  location  of 
the  eighth  point  is  again  independent  of  the  choice  of  reconstruction. 

The  position  of  the  point  is  shown  in  Fig.  6  for  the  examples  given  in  Fig.  3.  Instead 
of  just  showing  the  extra  point,  the  complete  polyhedron  used  to  cage  the  data  points  is 
outlined.  The  good  agreement  between  where  the  eighth  point  is  expected  (which  is  visible 
in  some  of  the  images)  and  the  predicted  position  highlights  the  accuracy  of  the  method. 
Again  the  position  of  the  eighth  point  and  the  caging  polyhedron  for  Fig.  5  are  shown  in 
Fig.  7. 


2.6.2  Geometric  approach 

Often  geometric  manipulations  provide  a  much  more  intuitive  understanding  of  the  imaging 
process  than  purely  algebraic  analysis.  In  this  case,  the  position  of  the  eighth  point  can  also 
be  predicted  geometrically,  not  by  explicitly  computing  the  3D  projective  structure  of  the 
polyhedra,  but  by  an  image-based  construction  detailed  in  Fig.  8. 


3  Objects  with  bilateral  symmetry 

A  single  camera  imaging  a  bilaterally  symmetric  object  is  equivalent  to  two  identical  cameras 
viewing  the  half  structure,  where  one  camera  is  transformed  to  the  other  by  a  reflection  in 
the  object  symmetry  plane.  This  is  a  special  case  of  structure  recovery  from  a  single  image 
of  repeated  structures  [24].  Thus,  a  single  uncalibrated  perspective  image  of  a  bilaterally 
symmetric  object  is  mathematically  identical  to  an  uncalibrated  stereo  pair  of  the  half 
structure.  Faugeras  [10]  and  Hartley  et  al.  [16]  have  shown  that  stereo  reconstruction  from 
two  uncalibrated  perspective  images,  generates  a  reconstruction  differing  from  the  actual 
3D  Euclidean  geometry  of  the  object  by  a  3D  projective  transformation.  Consequently, 
3D  projective  invariants  of  this  recovered  half-structure  have  the  same  value  as  projective 
invariants  measured  on  the  actual  Euclidean  half-structure. 

The  equivalence  with  stereo  means  that  epipolar  geometry  can  be  defined  within  a  sin¬ 
gle  image,  considerably  simplifying  the  establishment  of  correspondences  between  the  two 
half  structures.  This  is  described  in  Section  3.1.  As  described  above,  invariants  can  be 
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Figure  6:  The  eighth  point  for  the  caging  polyhedron.  Note  that  at  no  stage  has  position 
of  the  eighth  point  been  measured  in  the  image^  but  its  location  has  been  computed  from  the 
other  seven  points. 
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Figure  7:  The  caging  polyhedron  for  the  examples  in  Fig.  5. 


computed  by  recovering  projective  structure  as  in  classical  two  view  stereo.  We  describe 
two  alternative  methods  which  exploit  the  extra  constraints  available  in  this  case.  The  first 
method  (Section  3.2)  uses  a  geometric  argument  to  obtain  planar  projective  invariants;  each 
point-pair  contributing  two  extra  invariants.  The  second  method  (Section  3.3)  is  analytic, 
providing  an  efficient  method  to  recover  the  full  3D  structure.  In  this  case  each  point-pair 
contributes  three  extra  projective  invariants. 

It  is  worth  noting  that  because  only  projective  geometry  is  employed,  the  3D  object  need 
only  be  projectively  equivalent  to  one  with  bilateral  symmetry. 


3.1  Correspondence 

Lines  joining  corresponding  points  on  either  side  of  the  symmetry  plane  are  parallel  in  3D 
and  are  imaged  as  a  set  of  lines  converging  to  a  vanishing  point.  These  imaged  correspon¬ 
dence  lines  and  the  vanishing  point  are  the  analogue  of  “epipolar  lines”  and  the  “epipole” 
in  conventional  stereo.  We  will  use  these  terms  from  now  on.  The  epipole  can  be  deter¬ 
mined  using  two  pairs  of  corresponding  points  as  illustrated  in  Figs  9  and  10.  These  points 
are  matched  by  hand.  Once  the  epipole  has  been  computed,  further  correspondences  are 
simplified  to  a  ID  search  on  the  epipolar  line. 


3.2  Planar  invariants  -  geometric  method 

Lines  joining  corresponding  3D  points  intersect  the  symmetry  plane  at  the  midpoint  of  the 
corresponding  points.  Perspective  projection  does  not  preserve  mid-points.  However,  the 
images  of  the  3D  midpoints  can  be  computed  from  the  image  (see  below).  All  3D  mid¬ 
points  are  co-planar  (they  lie  on  the  symmetry  plane).  There  is  therefore  a  plane  projective 
transformation  between  the  set  of  imaged  mid-points  and  the  3D  points  on  the  plane  of  sym¬ 
metry.  Thus  planar  projective  invariants  can  be  measured  in  the  image  from  the  computed 
midpoints.  For  example  five  planar  points  have  two  plane  projective  invariants  [30]: 


_  |y43l||N52l|  I  J  _  |H42i|  1^532! 

”  IN421IIN531I  ^  |N432||N52ir 


(8) 


where  Nj-j*  =  (x^XjjX*;),  |.|  is  the  determinant,  and  x  =  (x,y,  1)^  is  the  homogeneous 
representation  of  a  2D  point.  Each  additional  point-pair  adds  a  planar  mid-point  and 


Figure  8:  The  geometric  construction  for  finding  the  eighth  hidden  point  of  a  six  sided 
polyhedra.  Note  that  all  of  the  properties  of  interest  are  projective  (hence  pairs  of  planes 
will  always  intersect),  (a)  first  find  the  intersection  of  planes  Fi  (the  top  plane)  and 
(the  bottom  plane):  edges  ei  and  €2  both  lie  on  the  left  hand  side  plane  of  the  object  and 
so  they  will  intersect  at  point  pi.  ^5  ei  and  62  He  respectively  on  planes  Fi  and  F2,  then 
the  two  planes  also  intersect  at  pi.  By  a  similar  argument  with  edges  63  and  64  construct 
the  point  p2  which  also  lies  on  both  Fi  and  F2.  Two  planes  intersect  in  projective  3-space 
in  a  line,  this  is  given  by  the  points  pi  and  p2  and  denoted  by  1.  In  (b)  reverse  the  process 
to  find  the  hidden  eighth  point.  This  point  must  lie  on  the  plane  defined  by  65  and  ee*  The 
edge  cq  is  not  observable,  but  it  can  be  computed:  65  and  65  both  lie  on  the  rear  plane  and 
so  must  intersect  (at  a  point  ps).  As  they  lie  on  Fi  and  F2  respectively,  pz  must  lie  on  1. 
Therefore  pz  is  defined  as  the  point  at  which  65  intersects  1.  Edge  €q  must  pass  through  vi 
as  well  as  pz  and  so  cq  is  defined.  The  eighth  point  lies  on  ee,  and  so  its  locus  is  restricted 
to  a  line.  The  argument  can  be  repeated  for  67  and  eg  to  restrict  the  eighth  point  to  lie  on 
eg.  Thus,  the  point  is  defined  by  the  intersection  of  ee  and  eg.  This  is  shown  in  (c)  with  the 
reconstruction  of  the  hidden  lines. 
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Figure  9:  The  epipole  can  be  located  using  the  intersection  of  lines  between  two  corresponding 
points  on  an  object  possessing  a  mirror  symmetry  (the  points  are  marked  by  filled  in  circles). 
Epipolar  lines  can  then  be  constructed  through  the  epipole  to  aid  correspondence. 


Figure  10:  The  epipolar  lines  for  two  marked  points  are  shown.  Note  that  the  corresponding 
points  (by  symmetry)  lie  on  the  lines.  The  four  points  marked  in  white  are  used  to  determine 
the  epipolar  structure. 
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Figure  11:  The  midpoint  of  any  pair  of  points  can  be  computed  using  the  cross  ratio  once 
the  location  of  the  epipole  is  known.  The  midpoints  for  the  four  pairs  of  points  of  Fig.  15b 
are  marked. 


generates  two  extra  plane  projective  invariants. 

The  image  of  the  3D  midpoints  can  be  computed  using  a  property  of  equally  spaced 
points  (see  [39]):  three  collinear  points,  separated  by  the  same  distance,  and  taken  with 
a  point  at  infinity  have  a  harmonic  cross- ratio. ^  The  point  at  infinity  on  the  line  joining 
two  corresponding  points  can  be  observed  since  it  is  imaged  as  a  vanishing  point.  Thus, 
the  position  of  the  midpoint  in  the  image  can  be  computed  from  the  image  coordinates  of 
the  corresponding  points  and  from  the  image  coordinates  of  the  vanishing  point.  This  is 
illustrated  in  Fig.  11.  Furthermore,  since  computing  a  point  that  has  a  fixed  cross-ratio  with 
respect  to  three  other  points  is  linear,  there  is  a  unique  solution. 

An  alternative  method  of  obtaining  the  mid-point  is  to  use  the  “cross-construction” 
shown  in  Fig.  12.  This  involves  two  corresponding  point  pairs,  and  does  not  use  the  position 
of  the  epipole.  In  practice  the  geometric  construction  is  more  accurate  than  the  method 
involving  the  cross  ratio  as  the  dependence  on  the  position  of  the  epipole  is  removed  - 
problems  can  occur  if  the  epipole  is  distant  from  the  image  since  it  tends  to  become  poorly 
localized;  this  affects  the  mid-point  computation. 

3,3  3D  invariants  -  analytic  method 

The  3D  point  positions  are  reconstructed  using  a  canonical  coordinate  frame  for  an  object 
with  bilateral  symmetry,  illustrated  in  Fig.  13  (cf.  the  canonical  frame  construction  of  [34]). 
Note,  structure  is  recovered  to  better  than  a  projective  ambiguity  because  of  the  orthogo¬ 
nality  constraints  available.  This  reconstruction  method  is  an  extension  to  projection  of  the 
affine  method  of  Fawcett,  et  al,  [11]. 

The  specification  of  the  canonical  frame  requires  choosing  XY  coordinates  for  four  basis 
points,  and  the  Z  coordinate  for  one  of  the  basis  points.  Once  these  coordinates  are  specified 
the  XY  Z  coordinates  of  any  other  point  can  be  determined  in  the  canonical  frame. 

The  four  basis  points  provide  a  projective  coordinate  system  for  the  symmetry  plane. 
Since  the  imaged  mid-points  and  symmetry  plane  are  within  a  projective  transformation, 


^That  is  a  cross  ratio  equal  to  negative  unity. 


Figure  12:  The  line  of  symmetry  Ig  of  the  plane  {a,a',b',b}  can  be  determined  geometrically 
as  follows:  compute  the  lines  U,  i  G  {1, . . . ,  4},  and  then  intersect  as  shown  to  give  mu  and 
m34;  these  constrain  I,.  Then,  the  midpoints  o/ {a,a'}  and  {b,b'}  are  defined  by  the 
intersections  o/b,  b  and  U. 


this  determines  the  XY  coordinates  in  the  canonical  frame.  It  only  remains  to  determine 
the  Z  coordinates.  This  proceeds  in  two  steps: 

1.  Determine  the  3x4  matrix  P:  This  matrix  projects  from  the  canonical  coordinate 
frame  {X,  Y,  Z,  1)''”  to  the  image  (x,  y,  1). 
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It  is  determined  from  four  imaged  point-pairs. 

The  plane  projection  transformation  for  the  symmetry  plane  {Z  =  0)  to  the  image  is 
given  by  : 


T  = 


Pn  Pu  Pu 
Pn  P22  P2A 
P31  P32  1 


The  correspondence  between  four  imaged  mid-points  and  their  chosen  position  (the 
basis  points)  in  the  canonical  frame  determines  T.  There  remains  only  three  unknowns 
in  P,  {Pi3,  P23,  ^33}.  From  eqn  (9)  the  3D  basis  point  (X,  Y,  Z,  1)*^  (the  one  for  which 
Z  has  been  chosen)  generates  two  linear  equations.  Similarly,  the  symmetry  related 
point  at  (X,  Y,  — 2",  1)^  generates  two  equations.  There  are  thus  four  linear  equations 
for  the  three  unknowns  {P13,  P23,  P33},  which  are  solved  using  least  squares. 
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Figure  13:  The  3D  reconsiruction  uses  a  coordinate  frame  with  the  X  and  Y  axes  in  the 
plane  of  symmetry^  and  the  Z  direction  perpendicular  to  the  plane.  Each  correspondence 
line  has  constant  XY  coordinates,  and  the  midpoints  of  any  corresponding  pair  of  points  has 
Z  =  0. 

2.  Determine  Z  coordinates  using  P:  XY  coordinates  for  any  other  point-pairs  are 
determined  by  applying  T  to  their  imaged  mid-point.  Each  individual  point  then 
generates  two  (i.e.  over-determined)  equations  for  Z  from  eqn  (9)  (one  for  Z  and 
the  other,  from  the  symmetry  related  point,  for  —Z).  In  the  noise  free  case  the  two 
solutions  will  be  equal;  in  practice  the  solutions  are  averaged  to  determine  Z. 

The  construction  is  valid  for  any  projective  or  affine  view  of  the  object  provided  the 
camera  optical  centre  does  not  lie  on  the  object  plane  of  symmetry.  Figure  14  illustrates 
different  views  of  the  reconstruction  (drawn  in  a  believable  Euclidean  frame)  of  a  stapler 
reconstructed  from  the  single  perspective  image  shown  in  Fig.  10. 


3.4  Verifying  the  bilateral  symmetry  assumption 

The  epipole  and  epipolar  geometry,  which  are  determined  from  the  basis  point  correspon¬ 
dence,  can  be  used  as  a  test  of  the  bilateral  symmetry  assumption.  Points  on  one  side  must 
lie  on  their  corresponding  epipolar  lines  generated  by  points  on  the  other  side. 
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Figure  14:  Three  dimensional  structure  is  recovered  modulo  a  projeciivity  from  the  points 
marked  on  the  stapler  from  a  single  view.  Four  typical  views  are  shown:  (a)  the  viewpoint 
is  from  a  position  close  that  of  the  original  camera  view;  (b)  the  observer  has  moved  round 
to  the  front  of  the  stapler;  (c)  and  (d)  from  other  general  viewpoints. 
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Figure  15:  (a)  a  single  view  of  an  object  with  a  plane  of  bilateral  symmetry  such  as  a  teaspoon 
is  sufficient  to  allow  a  full  3D  projective  reconstruction.  .  Only  two  pairs  of  distinguished 
points  are  needed  for  the  approach;  these  are  recovered  from  surface  markings  and  can  be 
used  to  determine  the  epipolar  structure  of  the  image  (b).  Once  the  epipolar  constraints 
are  available  one  can  produce  an  arbitrary  number  of  correspondences;  only  eight  points  are 
required  to  specify  the  reconstruction.  All  other  poinUpair  correspondences  are  assigned  3D 
projective  coordinates  based  on  the  3D  locations  of  the  first  eight  points. 


3.5  Further  examples 

The  point  set  reconstruction  method  can  be  adapted  to  space  curves  and  lines  with  bilateral 
symmetry. 

First,  we  discuss  space  curves.  The  symmetry  related  curves  are  matched  points  wise  in 
the  image  using  the  epipolar  geometry.  Only  two  pairs  of  corresponding  points  are  required 
to  initiate  the  construction.  For  example  in  Fig.  15,  two  pairs  of  points  can  be  extracted 
from  observable  object  markings  on  a  teaspoon  (the  part  of  the  spoon  of  interest  is  the  space 
curve  defined  by  its  boundary).  These  can  be  used  to  determine  the  position  of  the  epipole 
and  hence  define  the  epipolar  structure  given  in  (b).  The  epipolar  lines  are  initially  used  to 
determine  the  two  further  points  required  for  the  total  of  four  basis  points  (ideally  chosen  to 
span  the  length  of  the  object  to  yield  better  error  behaviour^).  Four  different  views  of  the 
reconstruction  for  Fig.  15  are  shown  in  Fig.  16  (the  3D  projective  representation  has  been 
constrained  to  lie  in  a  believable  Euclidean  frame). 

Second,  we  consider  the  computation  of  invariants  where  the  features  are  lines.  The 
computation  method  for  lines  differs  from  that  used  for  points.  The  method  builds  on  a 
construction,  proposed  in  [33]  and  used  by  [2,  9],  for  calculating  the  intersection  of  a  line, 
lying  out  of  the  plane,  with  a  plane  containing  four  points.  The  intersection  generates  an 
additional  point  on  the  plane,  and  plane  projective  invariants  can  then  be  measured  from 
the  five  coplanar  points.  The  original  construction  employed  two  views  of  a  general  configu¬ 
ration.  Again,  if  the  configuration  has  bilateral  symmetry  (there  are  now  two  symmetrically 
related  lines)  the  intersection  can  be  determined  from  a  single  image.  The  method  is  illus- 

^The  basis  points  in  Fig-  11  are  close  to  collinear.  However,  the  image  locations  of  the  points  are  known 
with  sufficient  (sub-pixel)  accuracy  that  the  construction  works  well.  Note  that  the  bfisis  points  effectively 
span  the  planar  region  containing  the  midpoints  of  the  spoon. 
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Figure  16:  Four  different  views  of  ike  3D  reconstruction  gained  from  Fig.  15.  The  construc¬ 
tion  works  very  well:  note  the  planarity  of  the  handle  recovered  in  (b)^  and  the  full  3D  shape 
in  all  of  the  images. 

trated  in  Fig.  17,  and  the  measured  invariants  (computed  separately  from  each  of  two  views) 
from  eqn  (8)  given  in  Table  3.  Clearly,  the  invariants  are  stable  under  change  in  viewpoint. 


h 

h 

view  1 
view  2 

-4.85 

-5.01 

-0.211 

-0.211 

Table  3:  The  measured  invariants  for  each  of  the  two  views  in  Fig.  17  are  given.  The  values 
remain  reasonably  constant  under  a  change  in  viewpoint. 
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Figure  17:  Two  different  views  of  a  stapler  are  shown  in  (a)  and  (b).  From  each  view  a 
pair  of  invariants  can  be  measured  from  the  image  features  highlighted  in  white:  the  four 
coplanar  points  (coplanar  by  symmetry)  and  the  symmetrically  related  lines  that  lie  out  of 
the  plane.  Invariants  are  computed  from  five  coplanar  points:  the  four  points  shown,  and  a 
fifth  point  generated  by  the  intersection  of  the  left  white  line  with  the  plane  containing  the 
four  points.  We  denote  the  plane  by  11,  and  the  intersection  point  by  X/. 

X/  is  computed  in  two  stages.  First,  the  projective  transformation,  T,  is  determined  which 
maps  each  of  the  four  points  in  11  onto  its  symmetrically  related  corresponding  point.  This 
transformation  maps  points  on  11  from  one  side  of  the  symmetry  plane  to  their  correspon¬ 
dences  on  the  other.  In  particular,  the  correspondence  to  X/  lies  on  the  extension  of  the 
right  line.  Therefore,  X/  will  lie  on  the  line  that  results  from  mapping  the  right  line  by  T. 
As  Xj  also  lies  on  the  left  line,  we  know  it  is  given  by  the  intersection  of  the  left  line  and 
the  result  of  mapping  the  right  line.  'Kj  is  thus  determined. 

The  process  can  be  repeated  for  (b)  and  (d)  and  the  invariants  compared.  Note  that  invariants 
can  also  be  computed  by  projecting  the  left  line  onto  the  right  half  image;  these  invariants 
will  be  functionally  dependent  on  the  first  two  values. 


4  Discussion 


In  this  paper  we  have  shown  how  projectively  invariant  indexing  functions  can  be  constructed 
for  SB^oint  sets  given  some  assumption  on  structure.  The  types  of  structure  that  we  have 
discussed  are  relevant  to  polyhedra  and  objects  possessing  a  bilateral  symmetry.  Together, 
these  results  make  a  significant  contribution  to  the  applications  of  invariant  theory  and  dispel 
the  belief  that  invariants  can  not  be  measured  for  (non-general)  three-dimensional  point  sets 
from  a  single  image.  However,  there  is  still  much  work  to  be  done  if  these  descriptors  are 
to  be  used  within  the  type  of  reliable  object  recognition  system  that  we  desire  (along  the 
lines  of  that  reported  for  planar  objects  in  [37]).  The  following  two  sections  introduce  some 
of  the  considerations  required  before  such  systems  can  be  built. 

4.1  Caged  point  sets 

One  of  the  major  drawbacks  of  the  polyhedral  construction  is  that  it  assumes  the  presence 
of  points  grouped  onto  planes  prior  to  constraint  formulation.^  Additionally,  each  point 
must  be  constrained  to  lie  on  multiple  planes.  This  is  a  much  harder  problem:  perhaps  the 
solution  lies  in  using  invariants  of  lesser  groups  (such  as  affine  [45]  or  quasi-invarianis  [4]) 
to  solve  the  grouping  task  before  proceeding  with  full  projective  measures,  and  to  a  certain 
extent  forms  the  basis  of  Lowe's  grouping  work  [21]. 

Another  problem  evolves  from  the  richness  of  the  polyhedral  description.  To  a  certain 
extent,  the  invariants  account  for  more  global  shape  features  than  just  the  local  vertex 
information  used  for  example  by  Wayner  [45].  This  develops  into  a  problem  with  respect 
to  the  property  of  position  freeness:  generally,  large  structures  are  unlikely  to  be  position 
free.  Although  a  graph  matching  technique  has  been  suggested  to  flag  when  polyhedra  are 
not  position  free  [36],  developing  a  principled  way  to  adjust  the  algebraic  constraints  so  that 
they  return  to  being  dependent  remains  very  much  a  topic  for  future  work.  An  alternative 
approach  would  be  to  measure  local  descriptions  and  develop  a  hypothesis  merging  strategy 
as  exploited  in  [37]. 

The  one  real  benefit  of  the  caging  and  projective  polyhedral  approach  to  recognition  is 
that  exact  projective  information  is  yielded  prior  to  recognition.  This  contrasts  with  the 
conclusions  of  Sugihara  [40],  who  through  using  a  Euclidean  world  model,  found  that  only 
parameterised  families  of  shapes  could  be  represented.  However,  another  goal  of  future 
work  should  be  to  extend  the  4  dof.  polyhedral  reconstruction  presented  in  this  paper  to 
encompass  a  more  general  class  of  polyhedra  such  as  5  dof,  figures,  or  perhaps  those  of 
higher  degree. 

4.2  Objects  with  symmetry 

Two  view  invariants  are  attractive  because  of  their  simplicity.  Very  few  features  are  required, 
though  again,  grouping  (correspondence)  must  be  solved  intra-image.  Grouping  is  simplified 
once  the  epipolar  structure  has  been  computed:  with  the  single  view  invariants  for  symmetric 
objects,  solving  for  epipolar  structure  is  very  easy.  Once  two  pairs  of  matching  symmetric 
points  have  been  found  the  epipolar  structure  is  defined  and  many  other  correspondences 
are  available.  When  there  are  a  sufficient  number  of  correspondences  full  3D  projective 

^  Generally  the  world  is  not  constructed  of  weU  formed  polyhedra,  and  so  any  major  application  would 
rely  on  caging.  Even  in  cases  in  which  actual  polyhedra  are  visible,  the  extraction  of  accurate  and  complete 
line  drawings  is  currently,  considered  hard.  Recent  research  into  the  extraction  of  polyhedral  image  features 
using  snakes  has  appeared  encouraging  [31]. 
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structure  can  be  recovered  with  further  use  of  the  location  of  the  epipole.  The  construction 
is  also  remarkably  stable. 

As  symmetry  is  prevalent  in  many  environments  it  is  clear  that  the  single  view  symmetry 
invarianta  have  a  broad  scope  within  computer  vision.  In  fact,  the  construction  can  be 
applied  to  any  projectively  related  repeated  structures.  In  particular,  structures  repeated 
by  translation  or  reflections,  and  objects  projectively  equivalent  to  these,  have  the  simple 
epipolar  constraint  provided  by  four  points. 
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A  Projective  equivalence  of  all  the  kernel  solutions 

The  derivation  in  this  appendix  shows  that  the  measures  taken  for  polyhedra  in  Section  2  are 
in  fact  invariant  to  three-dimensional  projective  actions.  We  prove  the  following  theorem: 

Theorem: 

For  a  position  free  polyhedra  with  a  matrix  A  of  kernel  dimension  four,  all  of 
the  solutions  of  the  kernel  represent  reconstructions  of  space  are  projectively 
equivalent. 

Proof: 

The  solution  space  for  the  polyhedra  is  represented  by  w  =  We 

ensure  the  basis  has  the  form  with  the  first  three  vectors  as  in  eqn  (7),  and  then 
let  the  fourth  basis  vector  be  composed  of  a  set  of  three- vectors  mj,  one  for  each 
plane: 


b4  = 

Then,  considering  solutions  for  the  planes  of  the  form  v  =  (u”^,  1)'’'’,  we  have: 


f  Ul 

u„ 


=  liihi  +  /X2b2  +  A*3b3  + 


mi 


\  m„ 


Letting  A  =  1/1x4,  a  =  -A  (/ii,/i2,M3)^,  gives: 


mi  =  Auj  +  a. 
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Writing  I3  as  the  3  x  3  identity  matrix  yields: 


f  “0  - 

A  I3  a 

0  1 

or. 

Mi  =  P  Vi. 

Therefore,  each  choice  of  M,  is  a  projectivity  of  the  actual  world  planes  Vj . 


References 

[1]  Barrett,  E.B.,  Payton,  P.M.  and  Brill,  M.H.  “Contributions  to  the  Theory  of  Projective 
Invariants  for  Curves  in  Two  and  Three  Dimensions,”  Proceedings  1'*  DARPA- ESPRIT 
Workshop  on  Invariance,  p. 387-425,  March  1991. 

[2]  Beardsley,  P.A,  ‘^Applications  of  Projective  Geometry  to  Robot  Vision,”  D.Phil.  Thesis, 
Department  of  Engineering  Science,  Oxford  University,  1992. 

[3]  Beardsley,  P.A.,  Zisserman,  A.  and  Murray,  D.W.  “Navigation  using  Affine  Structure 
from  Motion,”  Proceedings  ECCV3,  1994. 

[4]  Binford,  T.O.  and  Levitt,  T.S.  “Quasi-Invariants:  Theory  and  Explanation,”  Proceed¬ 
ing  Darpa  lUW,  p.819-829,  1993. 

[5]  Burns,  J.B.,  Weiss,  R.S.  and  Riseman,  E.M.  “The  Non-Existence  of  General-Case  View- 
Invariants,”  in  [30],  p. 120-131,  1992. 

[6]  Canny  J.F.  “A  Computational  Approach  to  Edge  Detection,”  IEEE  Trans.  PA  MI,  Vol. 
8,  No.  6,  p.679-698,  1986. 

[7]  Clemens,  D.T.  and  Jacobs,  D.W.  “Model  Group  Indexing  for  Recognition,”  Proceedings 
CVPR91,  p.4-9,  1991,  and  IEEE  Trans.  PAMI,  Vol.  13,  No.  10,  p. 1007-1017,  October 
1991. 

[8]  Clowes,  M.B.  “On  Seeing  Things,”  Artificial  Intelligence,  Vol.  2,  p. 79-116,  1971. 

[9]  Demey,  S.,  Zisserman,  A.  and  Beardsley,  P.  “Affine  and  Projective  Structure  from 
Motion,”  Proceedings  BMVC92,  p.49-58,  1992. 

[10]  Faugeras,  O.  “What  can  be  Seen  in  Three  Dimensions  with  an  Uncalibrated  Stereo 
Rig?”  Proceedings  ECCV2,  p.563-578,  1992. 

[11]  Fawcett,  R.  Zisserman,  A.  and  Brady,  J.M.  “Extracting  Structure  from  Single  Affine 
Views  of  3D  Point  Sets  with  One  or  Two  Bilateral  Symmetries,”  Proceedings  BMVC93, 
1993. 

[12]  Forsyth,  D.A.,  Mundy,  J.L.,  Zisserman,  A.P.,  Coelho,  C,,  Heller,  A.  and  Rothwell,  C.A. 
“Invariant  Descriptors  for  3-D  Object  Recognition  and  Pose,”  IEEE  Trans.  PA  MI,  Vol. 
13,  No.  10,  p.971-991,  October  1991. 

[13]  Forsyth,  D.A.  “Recognizing  Algebraic  Surfaces  from  their  Outlines,”  Proceedings 
ICCV4,  p.476-480,  1993. 

[14]  Gordon,  G,G.  “Shape  from  Symmetry,”  Proceedings  SPIE  Intelligent  Robots  and  Com¬ 
puter  Vision  VIII,  Algorithms  and  Techniques,  Vol.  1192,  1989. 


28 


[15]  Gros,  P.  “Outils  Geometriques  pour  la  Modelisation  et  la  reconnasissance  d’objets 
polyedriques,”  Ph.D,  thesis,  LIFIA-IMAG-INRIA,  Grenoble,  1993. 

[16]  Hartley,  R.I.,  Gupta,  R.  and  Chang,  T.  “Stereo  from  Uncalibrated  Cameras,”  Proceed- 
in'grCVPR92,  p.761-764,  1992. 

[17]  Huffman,  D.A.  “Impossible  Objects  as  Nonsense  Sentences,”  Machine  Intelligence,  Vol. 
6,  Meltzer,  B.  and  Michie,  D.  editors,  Edinburgh  University  Press,  1971. 

[18]  Huttenlocher,  D.P.  and  Kleinberg,  J.M.  “On  Invariants  of  Sets  of  Points  or  Line  Seg¬ 
ments  Under  Projection,”  TR-92-1292,  Cornell  University,  1992. 

[19]  Jacobs,  D.W.  “Space  Efficient  3D  Model  Indexing,”  Proceedings  CVPR92,  p.439-444, 
1992. 

[20]  Lamdan,  Y.,  Schwartz,  J.T.  and  Wolfson,  H.J.  “Object  Recognition  by  Affine  Invariant 
Matching,”  Proceedings  CVPR88,  p. 335-344,  1988. 

[21]  Lowe,  D.G.  Perceptual  Organization  and  Visual  Recognition,  Kluwer  Academic  Pub¬ 
lishers,  1985. 

[22]  Lowe,  D.G.  “The  Viewpoint  Consistency  Constraint,”  International  Journal  of  Com¬ 
puter  Vision,  Vol.  1,  No.  1,  p. 57-72,  1987. 

[23]  Liu,  J.,  Mundy,  3.L.,  Forsyth,  D.A.,  Zisserman,  A.  and  Rothwell,  C.A.  “Efficient 
Recognition  of  Rotationally  Symmetric  Surfaces  and  Straight  Homogeneous  General¬ 
ized  Cylinders,”  Proceedings  CVPR93,  p. 123-128,  1993. 

[24]  Liu  J.S.,  Mundy  J.L.  and  Walker  E.L.,  “Recognizing  Arbitrary  Objects  from  Multiple 
Projections,”  Proc.  Asian  Conference  in  Computer  Vision,  1993. 

[25]  Mackworth,  A.K.  “Interpreting  Pictures  of  Polyhedral  Scenes,”  Artificial  Intelligence, 
Vol.  4,  p.99-118,  1973. 

[26]  Mitsumoto  H.,  Tamura  S.,  Okazaki  K.,  Kajimi  N.  and  Fukui  Y.,  “3D  Reconstruction 
Using  Mirror  Images  Based  on  a  Plane  Symmetry  Recovery  Method”,  PAMI,  14,  9, 
941-945,  1992. 

[27]  Mohr,  R.  and  Morin,  L.  “Relative  Positioning  from  Geometric  Invariants,”  Proceedings 
CVPR91,  p.139-144,  1991. 

[28]  Moses,  Y.  and  Ullman,  S.  “Limitations  of  Non  Model-Based  Recognition  Systems,” 
Proceedings  ECCV2,  p.820-828,  1992. 

[29]  Mundy,  J.L.  and  Heller,  A.J.  “The  Evolution  and  Testing  of  a  Model-Based  Object 
Recognition  System,”  Proceedings  ICCV3,  p. 268-282,  1990. 

[30]  Mundy,  J.L.  and  Zisserman,  A.P.  Geometric  Invariance  in  Computer  Vision,  MIT 
Press,  1992. 

[31]  Mundy,  J.L.,  Huang,  C.,  Liu,  J.,  Hoffman,  W.,  Forsyth,  D.A.,  Rothwell,  C.A.,  Zisser¬ 
man,  A.,  Utcke,  S.  and  Bournez,  O.  “MORSE:  A  3D  Object  Recognition  System  Based 
on  Geometric  Invariants,”  to  appear  ARPA  lUW,  1994. 

[32]  Pillow,  N.,  Utcke,  S.  and  Zisserman,  A.  “’’Viewpoint-Invariant  Representation  of  Gen¬ 
eralized  Cylinders  Using  the  Symmetry  Set”,  To  appear,  BMVC94. 


29 


[33]  Quan,  L.  and  Mohr,  R.  ^Towards  Structure  from  Motion  for  Linear  Features  through 
Reference  Points,”  Proceedings  IEEE  Workshop  on  Visual  Motion,  1991. 

[34] _Eathwell,  C.A.,  Zisserman,  A.,  Forsyth,  D.A.  and  Mundy,  J.L.  “Canonical  Frames  for 

Planar  Object  Recognition,”  Proceedings  ECCV2,  p. 757-772,  1992. 

[35]  Rothwell,  C.A.,  Zisserman,  A.,  Mundy,  J.L.  and  Forsyth,  D.A.  “Efficient  Model  Library 
Access  by  Projectively  Invariant  Indexing  Functions”,  Proceedings  CVPR92,  p. 109-114, 

1992. 

[36]  Rothwell,  C.A.,  Forsyth,  D.A.,  Zisserman,  A.  and  Mundy,  J.L.  “Extracting  Projective 
Information  from  Single  Views  of  3D  Point  Sets,”  TR  OUEL  1927/92,  Department  of 
Engineering  Science,  Oxford  University,  Oxford,  1992. 

[37]  Rothwell,  C.A.  “Recognition  Using  Projective  Invariance”,  D.Phil.  Thesis,  Department 
of  Engineering  Science,  University  of  Oxford,  Oxford,  1993,  to  appear  OUP  1994. 

[38]  Sparr,  G.  “Depth  Computations  from  Polyhedral  Images,”  Proceedings  ECCV2,  p.378- 
386,  1992. 

[39]  Springer,  C.E.  Geometry  and  Analysis  of  Projective  Spaces,  Freeman,  1964. 

[40]  Sugihara,  K.  Machine  interpretation  of  Line  Drawings,  MIT  Press,  1986. 

[41]  Taubin,  G.  and  Cooper,  D.B.  “Recognition  and  Positioning  of  3D  Piecewise  Algebraic,” 
Proceeding  DARPA  Image  Understanding  Workshop,  p. 508-514,  September  1990. 

[42]  Thompson,  D.W.  and  Mundy,  J.L.  “Three-Dimensional  Model  Matching  from  an  Un¬ 
constrained  Viewpoint,”  Proceedings  ICRA,  p. 208-220,  1987. 

[43]  Van  Gool,  L.  Kempenaers,  P.  and  Oosterlinck,  A.  “Recognition  and  Semi- Differential 
Invariants,”  Proceedings  CVPR91,  p.454-460,  1991. 

[44]  Waltz,  D.  “Understanding  Line  Drawings  of  Scenes  with  Shadows,”  The  Psychology  of 
Computer  Vision,  Winston,  P.H.  editor,  M'^Graw-Hill,  p.  19-91,  1975. 

[45]  Wayner,  P.C.  “Efficiently  Using  Invariant  Theory  for  Model-Based  Matching,”  Pro¬ 
ceedings  CVPR91,  p.473-478,  1991. 

[46]  Weinshall,  D.  “Model-Based  Invariants  for  3-D  Vision,”  IJCV,  Vol.  10,  No.  1,  p. 27.42, 

1993. 

[47]  Weiss,  I.  “Projective  Invariants  of  Shapes,”  Proceedings  DARPA  Image  Understanding 
Workshop,  p.l  125-1 134,  April  1988. 

[48]  Zhang,  A.,  Deriche,  R.,  Faugeras,  O.  and  Luong,  Q-T.  “A  Robust  Technique  for  Match¬ 
ing  Two  Uncalibrated  Images  Through  the  Recovery  of  the  Unknown  Epipolar  Geom¬ 
etry,”  TR  INRIA,  Sophia  Antipolis,  1994. 

[49]  Zisserman,  A.P.,  Forsyth,  D.A.,  Mundy,  J.L.,  Rothwell,  C.A.  and  Liu,  J.  “3D  Object 
Recognition  using  Invariance”,  submitted  for  publication,  1994. 


30 


Solving  Polynomial  Systems  Using  a  Branch  and  Prune  Approach 


P.  Van  Hentenryck 
Brown  University 
Box  1910 

Providence,  RI  02912 
pvhOcs . brown . edu 


D.  Me  Allester 
MIT  AI  Lab 
Technology  Square,  545 
Cambridge,  USA 
dain@ai.init.edu 


D.  Kapur 
SUNY  at  Albany 
Dep.  of  Computer  Science 
Albany,  NY-12222 
kapur®cs . albany . edu 


Abstract 

This  paper  presents  Newton,  a  branch  &  prune  algorithm  to  find  all  isolated  solutions  of  a  system 
of  polynomial  constraints.  Newton  can  be  characterized  as  a  global  search  method  which  uses  in¬ 
tervals  for  numerical  correctness  and  for  pruning  the  search  space  early.  The  pruning  in  Newton 
consists  in  enforcing  at  each  node  of  the  search  tree  a  unique  local  consistency  condition,  called  box- 
consistency,  which  approximates  the  notion  of  arc-consistency  well-known  in  artificial  intelligence. 
Box-consistency  is  parametrized  by  an  interval  extension  of  the  constraint  and  can  be  instantiated 
to  produce  Hansen-Segupta  narrowing  operator  (used  in  interval  methods)  as  well  as  new  operators 
which  are  more  effective  when  the  computation  is  far  from  a  solution.  Newton  has  been  evaluated 
on  a  variety  of  benchmarks  from  kinematics,  chemistry,  combustion,  economics,  and  mechanics. 
On  these  benchmarks,  it  outperforms  the  interval  methods  we  are  aware  of  and  compares  well  with 
state-of-the-art  continuation  methods.  Limitations  of  Newton  (e.g.,  a  sensitivity  to  the  size  of  the 
initial  intervals  on  some  problems)  are  also  discussed.  Of  particular  interest  is  the  mathematical 
and  programming  simplicity  of  the  method. 

AMS  subject  Classification:  65H10,  65G10 

Keywords:  System  of  Equations,  Global  Methods,  Interval  and  Finite  Analysis 


1  Introduction 

Many  applications  in  science  and  engineering  (e.g.,  chemistry,  robotics,  economics,  mechanics) 
require  finding  all  isolated  solutions  to  a  system  of  polynomial  constraints  over  real  numbers.  This 
problem  is  difficult  due  to  its  inherent  computational  complexity  (i.e.,  it  is  NP-hard)  and  due  to 
the  numerical  issues  involved  to  guarantee  correctness  (i.e.,  finding  all  solutions)  and  to  ensure 
termination.  Several  interesting  methods  have  been  proposed  in  the  past  for  this  task,  including 
two  fundamentally  diflferent  methods:  interval  methods  (e.g.,  [4,  5,  7,  8,  11,  13,  14,  15,  20,  26,  30]) 
and  continuation  methods  (e.g.,  [25,  35]).  Continuation  methods  have  been  shown  to  be  effective 
for  problems  for  which  the  total  degree  is  not  too  high,  since  the  number  of  paths  explored  depends 
on  the  estimation  of  the  number  of  solutions.  Interval  methods  are  generally  robust  but  tend  to  be 
slow. 

The  purpose  of  this  paper  is  to  propose  and  to  study  a  novel  algorithm  called  Newton.  From 
a  user  standpoint,  Newton  receives  as  input  a  system  of  polynomial  constraints  over,  say,  variables 
xi,...,Xn  and  a  box,  i.e.,  an  interval  tuple  (Ji, . . . ,  /„)  specifying  the  initial  range  of  these  variables; 
it  returns  a  set  of  boxes  of  specified  accuracy  containing  all  solutions. 
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Operationally,  Newton  is  a  branch  &  prune  algorithm  which  was  inspired  by  the  traditional 
branch  and  bound  approach  used  to  solve  combinatorial  optimization  problems.  Newton  uses  in¬ 
tervals  to  address  the  two  fundamental  problems  listed  above.  Numerical  reliability  is  obtained  by 
evaluating  functions  over  intervals  using  outward  rounding  (as  in  interval  methods) .  The  complex¬ 
ity  issue  is  addressed  by  using  constraints  to  reduce  the  intervals  early  in  the  search.  The  pruning 
in  Newton  is  achieved  by  enforcing  a  unique  local  consistency  condition,  called  box-consistency, 
at  each  node  of  the  search  tree.  Box-consistency  is  an  approximation  of  arc-consistency,  a  notion 
well-known  in  artificial  intelligence  [17,  19]  and  used  to  solve  discrete  combinatorial  problems  in 
several  systems  (e.g.,,  [32,  33]).  Box-consistency  is  parametrized  by  an  interval  extension  operator 
for  the  constraint  and  can  be  instantiated  to  produce  various  narrowing  operators.  In  particular, 
box-consistency  on  the  Taylor  extension  of  the  constraint  produces  a  generalization  of  Hansen- 
Segupta  operator  [8],  well-known  in  interval  methods.  In  addition,  box-consistency  on  the  natural 
extension  produces  narrowing  operators  which  are  more  eflfective  when  the  algorithm  is  not  near  a 
solution.  Newton  has  the  following  properties: 

•  Correctness:  Newton  finds  all  isolated  solutions  to  the  system  in  the  following  sense:  if 

is  a  solution,  then  Newton  returns  at  least  one  box  7^)  such  that  Vi  G 

U  (1  ^  i  <  ^)-  In  addition,  Newton  may  guarantee  the  existence  of  a  unique  solution  in  some 
or  all  the  boxes  in  the  result.  If  the  solutions  are  not  isolated  (e.g.,  the  floating-point  system 
is  not  precise  enough  to  separate  two  solutions),  then  the  boxes  returned  by  the  algorithm 
may  contain  several  solutions. 

•  Termination:  Newton  always  terminates  in  finite  time. 

•  Effectiveness:  Newton  has  been  evaluated  on  a  variety  of  benchmarks  from  kinematics, 
chemistry,  combustion,  economics,  and  mechanics.  It  outperforms  the  interval  methods  we 
are  aware  of  and  compares  well  with  state-of-the-art  continuation  methods  on  many  problems. 
Interestingly,  Newton  solves  the  Broyden  banded  function  problem  [8]  and  More-Cosnard 
discretization  of  a  nonlinear  integral  equation  [23]  for  several  hundred  variables. 

•  Simplicity  and  Uniformity:  Newton  is  based  on  simple  mathematical  results  and  is  easy 
to  use  and  to  implement.  It  is  also  based  on  a  single  concept:  box-consistency. 

The  rest  of  this  paper  is  organized  as  follows.  Section  2  gives  an  overview  of  the  approach. 
Section  3  contains  the  preliminaries.  Section  4  presents  an  abstract  version  of  the  branch  &  prune 
algorithm.  Section  5  discusses  the  implementation  of  the  box-consistency.  Section  6  describes  the 
experimental  results.  Section  7  discusses  related  work  and  the  development  of  the  ideas  presented 
here.  Section  8  concludes  the  paper. 

2  Overview  of  The  Approach 

As  mentioned,  Newton  is  a  global  search  algorithm  which  solves  a  problem  by  dividing  it  into 
subproblems  which  are  solved  recursively.  In  addition,  Newton  is  a  branch  &  prune  algorithm 
which  means  that  it  is  best  viewed  as  an  iteration  of  two  steps 

1.  pruning  the  search  space; 

2.  making  a  nondeterministic  choice  to  generate  two  subproblems 
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until  one  or  all  solutions  to  a  given  problem  are  found. 

The  pruning  step  is  responsible  to  make  sure  that  some  local  consistency  conditions  are  satisfied. 
It  consists  of  reducing  the  intervals  associated  with  the  variables  so  that  every  constraint  appears 
to  be  locally  consistent.  The  local  consistency  condition  of  Newton  is  called  box-consistency,  an 
approximation  of  arc-consistency,  a  notion  well-known  in  artificial  intelligence  [17,  19]  and  used 
in  many  systems  (e.g.,  [33,  34,  31])  to  solve  discrete  combinatorial  search  problems.  Informally 
speaking,  a  constraint  is  arc-consistent  if  for  each  value  in  the  range  of  a  variable  there  exist  values 
in  the  ranges  of  the  other  variables  such  that  the  constraint  is  satisfied.  Newton  approximates 
arc-consistency  which  cannot  be  computed  on  real  numbers  in  general. 

The  pruning  step  either  fails,  showing  the  absence  of  solution  in  the  intervals,  or  succeeds  in  en¬ 
forcing  the  local  consistency  condition.  Sometimes,  local  consistency  also  implies  global  consistency 
as  in  the  case  of  the  Broyden  banded  function,  i.e., 

. . .,  a;„)  =  Xi{2  +  5xj)  +  1  -  X)  ^  ^  ”) 

jeJi 

where  Ji  =  {j  \  j  ^  i  t  max{l,i  -  5)  <  j  <  min{n,i  +  !)}•  The  pruning  step  of  Newton 
solves  this  problem  in  essentially  linear  time  for  initial  intervals  of  the  form  [-10®,  10®]  and  always 
proves  the  existence  of  a  solution  in  the  final  box.  However,  in  general,  local  consistency  does  not 
imply  global  consistency  either  because  there  are  multiple  solutions  or  simply  because  the  local 
consistency  condition  is  too  weak.  Consider  the  intersection  of  a  circle  and  of  a  parabola: 

f  xl  +  xl  =  l 

I  xl-  X2  =  0 

with  initial  intervals  in  [-10®,  10®]  and  assume  that  we  want  the  resulting  intervals  to  be  of  size 
10“®  or  smaller.  The  pruning  step  returns  the  intervals 

xi  e  [-1.0000000000012430057, +1.0000000000012430057] 
a;2  G  [-0.0000000000000000000,  +1.0000000000012430057] 

Informally  speaking,  Newton  obtains  the  above  pruning  as  follows.  The  first  constraint  is  used  to 
reduce  the  interval  of  a:  i  by  searching  for  the  leftmost  and  rightmost  “zeros”  of  the  interval  function 

Xf  +  [-10®,  10®]2  =  1 

These  zeros  are  obviously  -1  and  1  and  hence  the  new  interval  for  Xi  becomes  [-1,1]  (modulo  the 
numerical  precision  of  the  system).  The  same  reasoning  applies  to  x^.  The  second  constraint  can 
be  used  to  reduce  further  the  interval  of  X2  by  searching  for  the  leftmost  and  rightmost  zeros  of 

[-1, 1]2  -X2  =  0 


producing  the  interval  [0,1]  for  X2.  No  more  reduction  is  obtained  by  Newton  and  branching  is 
needed  to  make  progress.  Branching  on  Xi  produces  the  intervals 

Xi  e  [-1.0000000000012430057, +0.0000000000000000000] 
a;2  G  [-0.0000000000000000000,  +1.0000000000012430057] 
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Further  pruning  is  obtained  by  taking  the  Taylor  extension  of  the  polynomials  in  the  above  equations 
around  (-0.5, 0.5),  the  center  of  the  above  box  and  by  conditioning  the  system.  Newton  then  uses 
these  new  constraints  to  find  their  leftmost  and  rightmost  zeros  as  was  done  previously  to  obtain. 

xi  6  [-1.0000000000000000000,-0.6049804687499998889] 

X2  e  [+0.3821067810058591529,4-0.8433985806150308129]. 

Additional  pruning  using  the  original  statement  of  the  constraints  leads  to  the  first  solution 

xi  6  [-0.7861513777574234974,  -0.7861513777574231642] 

^2  G  [+0.6180339887498946804,  +0.6180339887498950136]. 

Backtracking  on  the  choice  for  xi  produces  the  intervals 

a;j  e  [-0.0000000000000000000,  +1.0000000000012430057] 

X2  G  [-0.0000000000000000000,  +1.0000000000012430057] 

and  to  the  second  solution 

x^  e  [+0.7861513777574231642,  +0.7861513777574233864] 
e  [+0.6180339887498946804,  +0.6180339887498950136]. 

Note  that,  in  this  case,  Newton  makes  the  smallest  number  of  choices  to  isolate  the  solutions.*  To 
conclude  this  motivating  section,  let  us  illustrate  Newton  on  a  larger  example  which  describes  the 
inverse  kinematics  of  an  elbow  manipulator  [11]: 

S2C5S6  —  53C5S6  —  54^5^6  4-  C2C6  +  C3C6  +  C4C6  =  0.4077 
C1C2S5  +  C1C3S5  +  C1C4S5  +  S1C5  =  1.9115 
S2®5  "b  4"  S4S5  =  1.9791 
<  C1C2  +  ciC3  +  C1C4  +  C1C2  +  C1C3  +  C1C2  =  4.0616 
S1C2  +  S1C3  +  S1C4  +  S1C2  +  S1C3  +  S1C2  =  1.7172 
^2  4"  S3  4"  S4  "I"  ^2  +  S3  +  S2  —  3.9701 
s?  +  c?  =  1  (1  <  j  <  6). 

and  assumes  that  the  initial  intervals  are  in  [-10®,  10®]  again.  The  pruning  step  returns  the  intervals 

[-1.0000000000000000000, +1.0000000000000000000] 

[-1.0000000000000000000, +1.0000000000000000000] 

[+0.3233666666666665800,  +1 .0000000000000000000] 

[- 1.0000000000000000000, +1 .0000000000000000000] 

[-0.0149500000000000189, +1.0000000000000000000] 

[-1.0000000000000000000, +1.0000000000000000000] 

[-0.0209000000000001407,  +1.0000000000000000000] 

[-1.0000000000000000000, +1.0000000000000000000] 

[+0.6596999999999998420,  +1 .0000000000000000000] 

[-0.7515290480087772896, +0.7515290480087772896] 

[-1.0000000000000000000, +1.0000000000000000000] 

[-1.0000000000000000000,  +1.0000000000000000000] 

*This  example  can  also  be  solved  by  replacing  in  the  first  equation  by  X2  to  obtain  a  univariate  constraint  in 
X2  which  can  be  solved  independently.  However,  this  cannot  always  be  done  and  the  discussion  here  is  what  Newton 
would  do,  without  making  such  optimizations. 
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showing  already  some  interesting  pruning.  After  exactly  12  branchings  and  in  less  than  a  second, 
Newton  piroduces  the  first  box  with  a  proof  of  existence  of  a  solution  in  the  box. 

3  Preliminaries 

In  this  section,  we  review  some  bcisic  concepts  needed  for  this  paper,  including  interval  arithmetic 
and  the  representation  of  constraints.  More  information  on  interval  arithmetic  can  be  found  in 
many  places  (e.g.,  [1,  8,  7,  20,  21]).  Our  definitions  are  slightly  non-standard. 

3.1  Interval  Arithmetic 

We  consider  =  3?U  {-00,00}  the  set  of  real  numbers  extended  with  the  two  infinity  symbols 
and  the  natural  extension  of  the  relation  <  to  this  set.  We  also  consider  a  finite  subset  T  of 
containing  -oo,oo,0.  In  practice,  corresponds  to  the  floating-point  numbers  used  in  the 
implementation. 

Definition  1  [Interval]  An  interval  [a,  b]  with  a,  6  G  is  the  set  of  real  numbers 

{r  G  I  a<r  <b}. 

The  set  of  intervals  is  denoted  by  I  and  is  ordered  by  set  inclusion.^ 

Definition  2  [Enclosure  and  Hull]  Let  5  be  a  subset  of  The  enclosure  of  S,  denoted  by  S  or 
box{S},  is  the  smallest  interval  I  such  that  5  C  7.  We  often  write  r  instead  of  {r}  for  r  G  The 
interval  hull  of  h  and  I2,  denoted  by  h  W  h,  is  defined  as  box{Ii  U  h}- 

We  denote  real  numbers  by  the  letters  r,  v,  .7^-numbers  by  the  letters  a,  6,  /,  m,  w,  intervals  by  the 
letter  7,  real  functions  by  the  letters  f,g  and  interval  functions  by  the  letters  F,G,  all  possibly 
subscripted.  We  use  a"*"  (resp.  ct~)  to  denote  the  smallest  (resp.  largest)  F-number  strictly  greater 
(resp.  smaller)  than  the  F-number  a.  To  capture  outward  rounding,  we  use  [r]  (resp.  [rj)  to 
return  the  smallest  (resp.  largest)  F-number  greater  (resp.  smaller)  or  equal  to  the  real  number 
r.  We  also  use  7  to  denote  a  box  (7i, . . . ,  7„)  and  f  to  denote  a  tuple  (ri, . . . ,  r„).  Q  is  the  set  of 
rational  numbers  and  Af  is  the  set  of  natural  numbers.  Finally,  we  use  the  following  notations. 

left{[l,  u])  =  I 
right{[l,  u])  =  u 
center{[l,  u])  =  [{I  +  u)/2\ 

The  fundamental  concept  of  interval  arithmetic  is  the  notion  of  interval  extension. 

Definition  3  [Interval  Extension]  F  :  J"  I  is  an  interval  extension  of  /  ;  3i"  ->•  3ft  iff 
V7i  ...7n  GT  G  7i,...,r„  G  7„  /  (ri,  • . . ,  r*„)  G  F(7i , . . . ,  7„). 

An  interval  relation  C  :  J"  ->■  Bool  is  an  interval  extension  of  a  relation  c  :  3ft"  ->  Bool  iff 
V7i .  ..7„  G  F  :  ri  G  7i,..  .,r„  G  7„  [c(ri, . . .,  r„)  C  {h, . . . ,  In)]- 

^Our  intervals  are  usually  called  floating-point  intervals  in  the  literature. 
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Example  1  The  interval  function  0  defined  as 

[ai,  6i]  0  [a2, 62]  =  [L^i  +  ,  r^i  +  ^2]] 

is  an  interval  extension  of  addition  of  real  numbers.  The  interval  relation  =  defined  as 

/i=/2  ^  (An  72/0) 

is  an  interval  extension  of  the  equality  relation  on  real  numbers. 

It  is  important  to  stress  that  a  real  function  (resp.)  can  be  extended  in  many  ways.  For  instance, 
the  interval  function  0  is  the  most  precise  interval  extension  of  addition  (i.e.,  it  returns  the  smallest 
possible  interval  containing  all  real  results)  while  a  function  always  returning  [—00, 00]  would  be 
the  least  accurate. 

In  the  following,  we  assume  fixed  interval  extensions  for  the  basic  real  operators  ,  X  and 
exponentiation  (for  instance,  the  interval  extension  of  +  is  defined  by  0)  and  the  basic  real  relations 
=,  >.  In  addition,  we  overload  the  real  symbols  and  use  them  for  their  interval  extensions.  Finally, 
we  denote  relations  by  the  letter  c  possibly  subscripted,  interval  relations  by  the  letter  C  possibly 
subscripted.  Note  that  constraints  and  relations  are  used  as  synonyms  in  this  paper. 


3.2  Unions  of  Intervals 


Even  though  many  basic  real  operators  can  be  naturally  extended  to  work  on  intervals,  division 
creates  problems  if  the  interval  used  for  dividing  includes  0.  A  tight  extension  of  division  to  intervals 
can  be  best  expressed  if  its  result  is  allowed  to  be  a  union  of  intervals  [10,  6,  12].  Assuming  that 
c  <Q  <  d  and  c  <  [a,  6]/[c,  d\  is  defined  as  follows: 


[[VcJ,oo] 

[-00,  \b/(I\]U[[b/c\,(X>] 
[-00,  \b/d\] 

[~co,  00] 

[-00,  \a/c\] 

[-00,  [a/c]]U[[VcJ,oo] 
[[a/dJ,oo] 


if  6  <  0  and  d  =  0 
if  b  <  0  and  c  <  0  <  d 
if  6  <  0  and  c  =  0 
if  a  <  0  <  6 
if  a  >  0  and  d  =  0 
if  a  >  0  and  c  <  0  <  d 
if  a  >  0  and  c  =  0. 


When  [c,  d]  =  [0,0],  [a,b]/[c,d]  =  [-00,00].  The  case  where  0  ^  [c,  d]  is  easy  to  define.  Note  also 
that  other  operations  such  cis  addition  and  subtraction  can  also  be  extended  to  work  with  unions 
of  intervals. 

A  typical  use  of  unions  of  intervals  in  interval  arithmetic  is  the  intersection  of  an  interval  / 
with  the  result  of  an  operation  of  the  form  Ic  —  In/ Id  to  produce  a  new  interval  I' .  To  increase 
precision,  Ic  —  In/Id  is  computed  with  unions  of  intervals  producing  a  result  of  the  form  /i  or  /xU/2- 
This  result  should  be  intersected  with  /  to  produce  a  single  interval  as  result  (i.e.,  not  a  union  of 
intervals) .  A  generalized  intersection  operation  □  defined  as 


(/i  u . . .  u  /„)  n  (/[  u . . .  u  O  =  (/i  n  /[ )  w  (/i  n  /^)  w . . .  w  (/„  n  /;„) 


is  used  for  this  purpose,  i.e.,  we  compute 


in{lc-in/ld). 


6 


3.3  Constraint  Representations 

It  is  well-known  that  different  computer  representations  of  a  real  function  produce  different  results 
when  evaluated  with  floating-point  numbers  on  a  computer.  As  a  consequence,  the  way  constraints 
are  written  may  have  an  impact  on  the  behaviour  on  the  algorithm.  For  this  reason,  a  constraint 
or  a  function  in  this  paper  is  considered  to  be  an  expression  written  in  some  formal  language 
by  composing  real  variables,  rational  numbers,  some  predefined  real  numbers  (e.g.,  tt),  arithmetic 
operations  such  as  -1-,  — ,  X  and  exponentiation  to  a  natural  number,  parentheses,  and  relation 
symbols  such  as  <,=.^  We  will  abuse  notation  by  denoting  functions  (resp.  constraints)  and  their 
representations  by  the  same  symbol.  Real  variables  in  constraints  will  be  taken  from  a  finite  (but 
arbitrary  large)  set  {a;i, . . . ,  a;„},  the  set  of  all  real  functions  is  denoted  by  Function,  and  the  set  of 
all  real  constraints  is  denoted  by  Constraint.  Similar  conventions  apply  to  interval  functions  and 
constraints.  Interval  variables  will  be  taken  from  a  finite  (but  arbitrary  large)  set  -(Ai, . . ., 
interval  functions  will  be  denoted  by  the  letter  F  and  interval  constraints  by  the  letter  C .  The 
set  of  all  interval  functions  is  denoted  by  FUNCTION  while  the  set  of  all  interval  constraints  is 
denoted  by  CONSTRAINT.  For  simplicity  of  exposition,  we  restrict  attention  to  equations.  It  is 
straightforward  to  generalize  our  results  to  inequalities  (see  Section  5.6). 

4  The  Branch  &  Prune  Algorithm 

This  section  describes  the  branch  &  prune  algorithm  Newton.  Section  4.1  defines  box-consistency, 
the  key  concept  behind  our  algorithm.  Section  4.2  shows  how  box-consistency  can  be  instantiated 
to  produce  various  pruning  operators  achieving  various  tradeoffs  between  accuracy  and  efficiency. 
Section  4.3  defines  a  conditioning  operator  used  in  Newton  to  improve  the  effectiveness  of  box- 
consistency.  Section  4.4  specifies  the  pruning  in  Newton.  Section  4.5  describes  the  algorithm. 
Recall  that  we  assume  that  all  constraints  are  defined  over  variables  xi,...,x„. 

4.1  Box  Consistency 

Box-consistency  [2]  is  an  approximation  of  arc-consistency,  a  notion  well-known  in  artificial  intelli¬ 
gence  [17]  which  states  a  simple  local  condition  on  a  constraint  c  and  the  set  of  possible  values  for 
each  of  its  variables,  say  jDi,  . . . ,  Dn-  Informally  speaking,  a  constraint  c  is  arc-consistent  if  none 
of  the  Di  can  be  reduced  by  using  projections  of  c. 

Definition  4  [Projection  Constraint]  A  projection  constraint  (c,  i)  is  the  association  of  a  constraint 
c  and  of  an  index  i  (1  <  i  <  n).  Projection  constraints  are  denoted  by  the  letter  p,  possibly 
subscripted. 

Example  2  Consider  the  constraint  Xi  +  x^  =  1.  Both  (xj  -|-  X2  =  1 ,  1)  and  (xf  -f  x^  =  1 , 2)  are 
projection  constraints. 

Definition  5  [Arc-Consistency]  A  projection  constraint  (c,  i)  is  arc-consistent  wrt  {Di, ...,  D„)  iff 
Di  =  Dj  n  {rj  I  3ri  G  Di, . . .,  3rj_i  G  Di-i, . . .,  3r,q.i  G  •  •  •  1 3^'n  G  ;  c(ri, . . . ,  r„)}. 

A  constraint  c  is  arc-consistent  wrt  {Di,...,D„)  if  each  of  its  projections  is  arc-consistent  wrt 
(Di, . . . ,  D„).  A  system  of  constraints  S  is  arc-consistent  wrt  (Di, . . . ,  D„)  if  each  constraint  in  S 
is  arc-consistent  wrt  {Di, ...,  D„). 

*It  is  easy  to  extend  the  language  to  include  functions  such  as  sin,  cos,  e,  .... 
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Example  3  Let  c  be  the  constraint  Xi  +  =  I-  c  is  arc-consistent  wrt  (  [“-1?  1]  i  [“1?  1]  )  but  is 

not  arc-consistent  wrt  ((  [-1, 1]  ,  [-2, 2]  )  since,  for  instance,  there  is  no  value  ri  for  xi  in  [-1, 1] 
such  that  -|-  2^  =  1. 

Arc-consistency  cannot  be  computed  in  general  when  working  with  real  numbers  and  polynomial 
constraints  and  simple  approximations  to  capture  machine  precision  are  very  expensive  to  compute. 
For  instance,  a  simple  approximation  of  arc-consistency  consists  in  working  with  intervals  and 
approximating  the  set  computed  by  arc-consistency  to  return  an  interval,  i.e., 

=  box{Ii  n  {  r,-  I  3ri  G  /i, . .  G  /j-i, . .  G  /j+i, . .  •,  3r„  G  In  •  •  •  •?  ^n)  }}• 

This  condition,  used  in  systems  like  [27,  3],  is  easily  enforced  on  simple  constraints  such  as 


Xi  X2  +  Xs,  Xi  ^  X2-  0:3,  Xi  =  X2  X  X3 


but  it  is  also  computationally  very  expensive  for  complex  constraints  with  multiple  occurrences  of 
the  same  variables.  Moreover,  decomposing  complex  constraints  into  simple  constraints  entails  a 
substantial  loss  in  pruning,  making  this  approach  unpractical  on  many  applications.  See  [2]  for 
experimental  results  on  this  approach  and  their  comparison  with  the  approach  presented  in  this 
paper. 

The  notion  of  box-consistency  introduced  in  [2]  is  a  coarser  approximation  of  arc-consistency 
which  provides  a  much  better  trade-off  between  efficiency  and  pruning.  It  consists  in  replacing  the 
existential  quantification  in  the  above  condition  by  the  evaluation  of  an  interval  extension  of  the 
constraint  on  the  intervals  of  the  existential  variables.  Since  there  are  many  interval  extensions  for 
a  single  constraint,  we  define  box-consistency  in  terms  of  interval  constraints. 


Definition  6  [Interval  Projection  Constraint]  An  interval  projection  constraint  {C,i)  is  the  asso¬ 
ciation  of  an  interval  constraint  C  and  of  an  index  i  (1  <  i  ^  Interval  projection  constraints 
are  denoted  by  the  letter  P,  possibly  subscripted. 


Definition  7  [Box-Consistency]  An  interval  projection  constraint  (C,  i)  is  box-consistent  wrt  I  — 


(/i,. iff 

C(/i,  .  .  . , [/, /”^],  4+1}  .  •  • ,  fn)  A  C(/i,  .  .  . ,  4-1,  [w  ,  tt],  4+17  •  •  *  ?  ^n)* 

where  I  =  left{Ii)  and  u  =  right {li)  An  interval  constraint  is  box-consistent  wrt  I  if  each  of  its 
projections  is  box-consistent  wrt  /.  A  system  of  interval  constraints  is  box-consistent  wrt  I  iff  each 
interval  constraint  in  the  system  is  box-consistent  wrt  /. 


Intuitively  speaking,  the  above  condition  states  that  the  zth  interval  cannot  be  pruned  further 
using  the  unary  interval  constraint  obtained  by  replacing  all  variables  but  Xi  by  their  intervals, 
since  the  boundaries  satisfy  the  unary  constraint.  Note  also  that  the  above  condition  is  equivalent 
to 

—  box^  Tj  G  4  I  ^(4 1  •*•7  4—17  ^*7  4+1 7  •*'?  4i  } 

which  shows  clearly  that  box-consistency  is  an  approximation  of  arc-consistency.^  The  difference 
between  arc-consistency  and  box-consistency  appears  essentially  when  there  are  multiple  occur¬ 
rences  of  the  same  variable. 

is  interesting  to  note  that  this  definition  is  also  related  to  the  theorem  of  Miranda  [26].  In  this  case,  box- 
consistency  can  be  seen  as  replacing  universally  quantified  variables  by  the  intervals  on  which  they  range. 
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Example  4  Consider  the  constraint  xi  +  X2  -  xi  -  0.  The  constraint  is  not  arc-consistent  wrt 
{[-1, 1],  [-1, 1])  since  there  is  no  value  ri  for  xi  which  satisfies  ri  -f  1  —  ri  =  0.  On  the  other  hand, 
the  interval  constraint  X1+X2-X1  =  0  is  box-consistent,  since  ([-1,  +  -l"^] -[-1,  l])n[0,0] 

and  ([-1, 1]  +  [1“,  1]  -  [-1, 1])  n  [0,  0]  are  non-empty. 

4.2  Interval  Extensions  for  Box  Consistency 

Box-consistency  strongly  depends  on  the  interval  extensions  chosen  for  the  constraints  and  different 
interval  extensions  can  produce  very  different  (often  incomparable)  tradeoffs  between  pruning  and 
computational  complexity.  In  this  section,  we  consider  three  extensions  used  in  Newton:  natural 
interval  extension,  distributed  interval  extension,  and  Taylor  interval  extension. 

4.2.1  Natural  Interval  Extension 

The  simplest  extension  of  a  function  (resp.  of  a  constraint)  is  its  natural  interval  extension.  Infor¬ 
mally  speaking,  it  consists  in  replacing  each  number  by  the  smallest  interval  enclosing  it,  each  real 
variable  by  an  interval  variable,  each  real  operation  by  its  fixed  interval  extension  and  each  real 
relation  by  its  fixed  interval  extension. 

Example  5  [Natural  Interval  Extension]  The  natural  interval  extension  of  the  function  a:i(a;2-|-X3) 
is  the  interval  function  Xi(X2+X3).  The  natural  interval  extension  of  the  constraint  xi[x2+x-i)  =  0 
is  the  interval  constraint  Xi{X2  -f  .<^3)  =  0- 

The  advantage  of  this  extension  is  that  it  preserves  the  way  constraints  are  written  and  hence 
users  of  the  system  can  choose  constraint  representations  particularly  appropriate  for  the  problem 
at  hand.  A  very  nice  application  where  this  extension  is  fundamental  is  the  More-Cosnard  dis¬ 
cretization  of  a  nonlinear  integral  equation  (See  Section  6.2).  Using  the  natural  extension  allows 
users  to  minimize  the  problem  of  dependency  of  interval  arithmetic  and  hence  to  increase  precision. 
In  the  following,  if  /  (resp.  c)  is  a  real  function  (resp.  constraint),  we  denote  by  /  (resp.  c)  its 
natural  extension. 

4.2.2  Distributed  Interval  Extension 

The  second  interval  extension  used  by  Newton  does  not  preserve  the  way  constraints  are  written 
but  uses  a  distributed  form  of  the  constraints.  The  key  advantage  of  this  extension  is  that  it 
allows  the  algorithm  to  enforce  box-consistency  by  applying  interval  Newton  method  on  univariate 
real  functions.  The  real  functions  are  derived  from  univariate  interval  constraints  obtained  by 
replacing  all  but  one  variable  by  their  intervals.  As  a  consequence,  applying  box-consistency  will 
be  particularly  efficient,  although  the  pruning  may  be  weaker  than  for  the  natural  extension  due 
to  the  dependency  problem  of  interval  arithmetics.®  Intuitively,  the  distributed  interval  extension 
should  be  viewed  as  a  way  to  speed  up  the  computation  of  box-consistency  on  the  natural  extension. 
However,  it  may  happen  that  it  gives  more  precision  than  the  natural  extension  if  users  are  not 
careful  in  stating  their  constraints. 

®Note  that  it  is  not  always  necessary  to  go  through  the  distributed  form  to  obtain  the  above  property  but  Newton 
adopts  it  for  simplicity. 
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Definition  8  [Distributed  Form]  A  constraint  c  in  (simplified)  sum  of  products  form 

mi  +  •  •  •  +  mfc  =  0, 

where  each  monomial  mi  is  of  the  form  qx^  •  •  •  with  q  E  Q  and  e,-  E  Af ,  is  said  to  be  in 
distributed  form.® 

Definition  9  [Distributed  Interval  Extension]  The  distributed  interval  extension  of  a  function  / 
(resp.  constraint  c)  is  the  natural  extension  of  its  distributed  form. 

Example  6  [Distributed  Interval  Extension]  The  distributed  interval  extension  of  the  function 
Xi{x2  +  xs)  is  the  interval  function  X1X2  +  XiX^.  The  distributed  interval  extension  of  the 
constraint  xi{x2  +  x^)  =  0  is  the  interval  constraint  X\X2  +  XiXs  =  0. 

In  the  following,  the  distributed  interval  extension  of  a  function  /  (resp.  of  a  constraint  c)  is 
denoted  by  /  (resp.  c). 

4.2.3  Taylor  Interval  Extension 

The  last  interval  extension  we  introduce  is  based  on  the  Taylor  expansion  around  a  point.  This 
extension  is  an  example  of  centered  forms  which  are  interval  extensions  introduced  by  Moore  [20] 
and  studied  by  many  authors,  since  they  have  important  properties.  The  Taylor  interval  extension 
of  a  constraint  is  parametrized  by  the  intervals  for  the  variables  in  the  constraint.  It  also  assumes 
that  the  constraint  which  it  is  applied  to  is  of  the  form  /  =  0  where  /  denotes  a  function  which 
has  continuous  partial  derivatives.  Given  these  assumptions,  the  key  idea  behind  the  extension  is 
to  apply  a  Taylor  expansion  of  the  function  around  the  center  of  the  box  and  to  bound  the  rest  of 
the  series  using  the  box. 

Definition  10  [Taylor  Interval  Extension]  Let  c  be  a  constraint  /  =  0,  /  be  a  function  with 
continuous  partial  derivatives,  /  be  a  box  (/i,...,/^),  and  mi  be  the  center  of  7*.  The  Taylor 
interval  extension  of  c  wrt  7,  denoted  by  is  the  interval  constraint 

/(ml,  +  {Xi  -  mi)  =  0. 

In  the  current  version  of  our  system,  the  partial  derivatives  are  computed  using  automatic  differ¬ 
entiation. 

4.3  Conditioning 

It  is  interesting  to  note  that  box-consistency  on  the  Taylor  interval  extension  is  closely  related  to 
Hansen-Segupta’s  operator  [8],  which  is  an  improvement  over  Krawczyk’s  operator  [15].  Hansen 
and  Smith  [9]  also  argued  that  these  operators  are  more  effective  for  a  system  {/i  =  0,  ...,/„  =  0} 
wrt  a  box  (7i, . . . ,  In)  when  the  interval  Jacobian 

=  il<i,j<n) 

®The  distributed  version  can  easily  be  tinned  into  a  canonical  representation  for  constraints. 
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is  diagonally  dominant,  i.e., 


n 

mig{Mi^i)  >  ^  mag{Mij) 

where 

mig{[l,  m])  =  min{\  / 1,  |  m  |)  and  mag{[l,  u])  =  maj:(l  /  |,  |  u  |). 

They  also  suggest  a  conditioning  which  consists  in  multiplying  the  linear  relaxation  by  a  real 
matrix  which  is  the  inverse  of  the  matrix  obtained  by  taking  the  center  of  Mij.  The  resulting 
system  is  generally  solved  through  Gauss-Seidel  iterations,  giving  Hansen-Segupta’s  operator.  See 
also  [13,  14]  for  an  extensive  coverage  of  conditioners).  Newton  exploits  this  idea  to  improve  the 
effectiveness  of  box-consistency  on  the  Taylor  interval  extension.  The  conditioning  of  Newton  is 
abstracted  by  the  following  definition. 

Definition  11  [Conditioning]  Let  5  =  {/i  =  0,  ...,/„  =  0}.  A  conditioning  of  5  is  a  system 
5' =  {/(  =  0,  ...,/;  =  0}  where 

//  =  ^  ^ikfk 
fc=i 

where  Aik  6  Q- 

In  its  present  implementation,  Newton  uses  a  conditioning  cond{{fi  =  0,  ...,/n  =  0},/)  which 
returns  a  system  {/[  =  0, . . . ,  /^  =  0}  such  that 

fl  =  X) 

fc=i 

where 

•  •  • )  -^n)  (1  <  j  < 

Bij  =  center  {Mi  j) 

^  (  B~^  if  B  is  not  singular 

~  /  otherwise. 

Note  that  the  computation  of  the  inverse  of  B  is  obtained  by  standard  floating-point  algorithms 
and  hence  it  is  only  an  approximation  of  the  actual  inverse. 

4.4  Pruning  in  Newton 

We  now  describe  the  pruning  of  Newton.  The  key  idea  behind  Newton  is  to  apply  box-consistency 
at  each  node  of  the  search  tree,  i.e.,  Newton  reduces  the  current  intervals  for  the  variables  in  such 
a  way  that  the  constraint  system  is  box-consistent  wrt  the  reduced  intervals  and  no  solution  is 
removed.  The  pruning  is  performed  by  using  narrowing  operators  deduced  from  the  definition  of 
box-consistency.  These  operators  are  used  to  reduce  the  interval  of  a  variable  using  a  projection 
constraint. 

Definition  12  [Box-Narrowing]  Let  (C,  i)  be  an  interval  projection  constraint  and  (/i, . . .,  7„)  be 
a  box.  The  narrowing  operator  BOX-NARROW  is  defined  as 


11 


BOX-NARROW((C,0,(/i,...,/n))  =  (/i, . . . , /i-i, /, /i+i, . . . , 4) 

where  I  is  defined  as  the  largest  set  included  in  such  that  (C,  i)  is  box-consistent  with  respect  to 

,  /j— 15  ?  -^n)* 


Proposition  1  [Soundness  of  the  Box- Narrowing]  Let  C  be  an  interval  extension  of  c,  (C7,  i)  be  an 
interval  projection  constraint,  {/i,  be  a  box,  and 

(/i, . . . ,  /n)  =  B0X-NARR0W((C,  i),  (A, .  ..,/„))• 


Then 


Vi  G  G  In  ^  ^(^1?  •  •  •  ?  ^n)  ^  E  I 


Proof  Assume  that  ri  G  A, . . Tn  6  /„  and  that  r,*  ^  /.  Then  either  r,-  <  /  or  r*  >  w,  where 
/  =  [/,  u].  If  r,*  <  /,  we  have  that 


C7  (A  7  •  •  •  ?  f*— 1  ?  ?  A+ !?•••)  In  ) 
by  definition  of  an  interval  extension  and 

C(A,  •  •  • ,  li-u  [u~,u],  A+i, . . . ,  /n) 

by  hypothesis.  Hence,  C  is  box-consistent  wrt  (A?  •  •  •  5  A-i?  f  ^  A+i?  •  •  •  ?  which  contradicts 
our  hypothesis  that  /  is  defined  as  the  largest  set  included  in  A  such  that  (C,  i)  is  box-consistent 
with  respect  to  (Aj  •  •  • )  A-ij  f ?  A+i?  •  •  •  j  A)-  The  case  >  w  is  similar.  □ 

We  are  now  in  a  position  to  define  the  pruning  algorithm  of  Newton  which  consists  essentially 
in  applying  the  narrowing  operators  of  each  projection  until  no  further  reduction  occurs.  The 
pruning  algorithm  is  depicted  in  Figure  1.  It  first  applies  box-consistency  on  the  natural  and 
distributed  extensions  until  no  further  reduction  occurs  and  then  applies  box-consistency  on  the 
Taylor  extension.  The  two  steps  are  iterated  until  a  fixpoint  is  reached.  Termination  of  the 
algorithm  is  guaranteed  since  the  set  ^  is  finite  and  thus  the  intervals  can  only  be  reduced  finitely 
often. 


4.5  The  Branch  and  Prune  Algorithm  Newton 

Figure  2  is  a  very  high  level  description  of  the  branch  and  prune  algorithm  highlighting  the  control 
flow.  The  algorithm  applies  operation  PRUNE  on  the  initial  box.  If  the  resulting  box  is  empty 
(which  means  that  one  of  its  components  is  empty),  then  there  is  no  solution  by  Proposition  1.  If 
the  resulting  box  is  small  enough  (specified  by  the  desired  accuracy  in  solutions),  then  it  is  included 
in  the  result.  The  function  BRANCH  splits  the  box  into  two  subboxes  along  one  dimension  (variable). 
Variables  for  splitting  are  chosen  by  BRANCH  using  a  round-robin  heuristic:  if  {xi^ . . . ,  Xn}  is  the  set 
of  variables,  then  the  algorithm  splits  the  variables  in  the  order  xi,  0:2, .  ♦ . ,  and  reiterates  the 
process  until  a  solution  is  found. 
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procedure  PRUNE(in  S:  Set  of  Constraint;  inout  I :  I”) 

begin 

repeat 

Ip  :=  I;  .. 

BOX-PRUNE ({(c,  i)  1  c  e  5  &:  1  <  i  <  n}  U  {(c,  i)  |  c  6  5  &  1  <  i  <  «} ,/)  ; 

BOX-PRUNE  ({(c*(^,  0  I  c  e  cond{S,  T)  &  l<i<n},I); 
until  I  ^  Ipi 

end 

procedure  BOX-PRUNE(in  V:  Set  of  Interval  Projection  Constraint;  inout  1:1’') 

begin 

repeat 

Ip  :=  I; 

J  :=  n{B0X-NARR0W(F,/)  |  PeV  k  l<i<n  } 
until  I  =  Ip‘, 

end 


Figure  1:  Pruning  in  Newton 


function  BranchAndPrune(5;  Set  of  Constraint;  Iq  :!’'):  Set  of  I”; 
begin 

J  :=  PRUNE(<S,io); 
if  -1  IsEmptyC/)  then 

if  IsSmallEnough(7)  then 
return  {f} 
else 

(luh)  :=  BRANCH (/); 

return  BranchAndPrune(5, A)  U  BranchAndPrune(5,/2) 
endif 
else 

return  0 
endif 


Figure  2:  The  Branch  and  Prune  Algorithm 
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function  LNARCJP,  F' :  X  — I,  1:1):  I; 
begin 

r  :=  right  {I)  i 
if  0  ^  F{I)  then 
return  0; 

/  :=  N*{F,F',I); 
if  0  G  F{[left{I),left{I)+])  then 
return  7  W  f 
else  if 

(7i,72)  :=  SPLIT (7); 
if  LNAR(F,  F',  7i)  7^  0  then 
return  LNAR(F,  F',  7i)  W  r 
else 

return  LNAR(F,  F',  I2)  W  f 

endif 

end; 


Figure  3:  The  function  LNAR 


5  Implementation  of  Box- Consistency 

Figure  2  is  a  precise  description  of  the  branch  and  prune  algorithm  which  leaves  open  the  imple¬ 
mentation  of  procedure  BOX-NARROW,  The  purpose  of  this  section  is  to  describe  how  this  procedure  is 
implemented  in  Newton.  The  basic  idea  of  the  implementation  is  to  use  a  different  implementation 
of  procedure  BOX-NARROW  for  each  interval  extension  in  order  to  exploit  their  specific  properties.  We 
thus  present  three  procedures  in  the  section:  BOX-NARROW-NE,  BOX-NARROW-DE,  and  BOX-NARROW-TE. 
In  addition,  it  is  more  convenient  to  define  them  in  terms  of  projection  constraints  (instead  of  in 
terms  of  interval  projection  constraints).^  The  rest  of  this  section  is  organized  as  follows.  We 
start  by  describing  a  basic  tool  used  in  the  implementations,  then  describe  the  various  narrowing 
operators,  discuss  how  to  prove  the  existence  of  solution,  and  conclude  by  some  implementation 
issues. 


5.1  Extreme  Zeros  of  an  Interval  Function 

Box-consistency  can  often  be  reduced  to  two  subproblems  which,  informally  speaking,  consist  in 
shrinking  the  left  (resp.  the  right)  of  an  interval  /'  to  the  leftmost  (resp.  rightmost)  zero  of 
a  univariate  interval  function  F  in  The  univariate  interval  function  F  is  an  extension  of  a 
real  function  /  which  is  either  univariate,  in  which  case  F  is  a  traditional  interval  extension,  or 
multivariate,  in  which  case  F  is  obtained  by  taking  an  interval  extension  of  /  and  substituting  all 
variables  but  one,  say  Xj,  by  their  intervals.  In  addition,  we  will  have  at  our  disposal  a  function  F', 
the  “derivative”  of  F,  which  is  either  an  interval  extension  of  the  derivative  of  /  (univariate  case) 
or  an  interval  extension  of  the  partial  derivative  of  /  wrt  Xi  in  which  all  variables  but  Xi  have  been 
replaced  by  their  intervals. 

^There  is  no  difficult  in  modifying  the  algorithm  of  Figure  2  to  accommodate  this  change. 
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The  subproblems  can  be  computed  using  a  variation  of  the  univariate  interval  Newton  method. 
The  method  uses  the  following  property  for  pruning  the  search  space 

0  G  F{I)  ^  0  e  N{F,  F',  I) 

where  - - 

iV(F,F',/)  =  7  n 

More  precisely,  the  algorithm  uses  N*{F,F',I)  =  where 

lo  =  I 

J.+i  =  iV(F,FM.)  (0<i) 

Figure  3  depicts  a  simple  function  LNAR  to  shrink  to  the  leftmost  zero  (function  RNAR  is  defined 
similarly).  It  make  uses  of  a  function  SPLIT  which,  given  an  interval  I,  returns  two  intervals  Ii  and 
h  such  that  7  =  /1U/2  and  leftih)  =  right{Ii).  The  algorithm  first  applies  the  pruning  procedure. 
It  terminates  if  the  left  zero  has  been  found  or  the  interval  cannot  contain  a  zero.  Otherwise,  the 
interval  is  split  into  two  subintervals.  The  leftmost  interval  is  explored  first.  If  it  does  not  contain  a 
zero,  the  rightmost  interval  is  explored.  It  is  worthwhile  mentioning  that  the  reduction  on  the  right 
bound  must  not  be  used  as  part  of  the  result,  since  the  function  would  not  meet  its  specification 
in  this  case.  Function  RNAR  is  responsible  for  finding  the  rightmost  zero. 

5.2  Box-Consistency  on  the  Natural  Interval  Extension 

We  are  now  in  a  position  to  define  the  narrowing  operator  for  the  natural  interval  extension.  The 
basic  idea  is  very  simple.  For  a  constraint  (/  =  0,  i),  it  consists  in  taking  an  interval  extension  of  / 
and  replacing  all  variables  but  xi  by  their  interval  7,-  to  obtain  a  univariate  function.  The  leftmost 
and  rightmost  zeros  are  then  computed  to  produce  the  result. 

Definition  13  [Narrowing  Operator  for  the  Natural  Interval  Extension]  The  narrowing  operator 
for  the  natural  interval  extension  is  defined  as  follows: 

B0X-NARR0W-NE((/  =  0, i),  (7i, . . ., 7„))  =  (7i, . . ., 7,_i,  7, 7i+i, . . .,  7„) 
where 

7  =  LNAR(F,F',RNAR(F,F',7,)) 

F{X)  =  g7i,...,7._i,X,7i+i,...,-fn) 

F'{X)  =  ^(7i,...,7._i,X,7i+i,...,7„) 

Example  7  Let  c  be  the  constraint  a;^  +  a:^  —  1  =  0  and  7  be  ([—1, 1],  [—1, 1]).  The  function  F  and 
F'  for  i  =  1  in  the  above  definition  are  defined  as  follows: 

F(X)  =  X2  +  [-l,l]2-  1. 

F'(X)  =  2X. 

It  is  important  to  note  that  box-consistency  on  the  natural  extension  (and  on  the  distributed 
extension  as  well)  can  be  applied  even  if  the  function  is  not  differentiable.  It  suffices  to  omit  the 
application  of  operator  N*  in  the  functions  LNAR  and  RNAR. 
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5.3  Box-Consistency  on  the  Distributed  Interval  Extension 

We  now  turn  to  the  narrowing  operator  for  the  distributed  interval  extension.  Box-consistency  on 
the  distributed  interval  extension  can  be  enforced  by  using  the  uniA^riate  interval  function 

and  by  searching  its  extreme  zeros.  However,  since  the  function  is  in  distributed  form,  it  is  possible 
to  do  better  by  using  an  idea  from  [11].  The  key  insight  is  to  sand  witch  F  exactly  between  two 
univariate  real  functions  //  and  /„  defined  as  follows: 

f,{x)  =  left{F(x)) 

/„(x)  =  right  {F{x)). 

Note  that  box-consistency  can  now  be  enforced  by  searching  the  leftmost  and  rightmost  zeros  of 
some  interval  extensions,  say  Fi  and  Fu,  of  fi  and  /„  using  interval  extensions,  say  and  F^^,  of 
their  derivatives  //  and  /^.  Of  course,  LNAR  and  RNAR  can  be  used  to  find  these  extreme  zeros. 

The  key  advantage  of  the  distributed  interval  extension  is  that  it  is  easy  to  define  the  function 
fi  and  fu  constructively.  Let  F  be  of  the  form 

F{X)  =  /iX"‘  +  . . .  -h  IpX"^. 


Function  fi  is  defined  as 


/;(x)  =  low{Ii,x,ni)  +  . 

..-t-  low{Ip,x,np) 

where 

o 

II 

f  left{I)  x" 

[  right  (I)  x" 

if  X  >  0  V  n  is  even 
otherwise 

Function  /«  is  defined  as 

/u(x)  =  high{Ii,x,ni)  -t- . 

. .  +  high{Ip,x,np) 

where 

11 

f  right  {I)  x" 

[  left{I)  x" 

if  X  >  0  V  n  is  even 
otherwise 

It  is  easy  to  see  that  the  two  definitions  of  fi  and  /« 

are  similar. 

Example  8  Consider  the  function  xi(xi  *X2)  -4  and  assume  that  Xi  and  X2  range  over  [0,  Ij.  The 
distributed  interval  extension  is 

X^  +  XiX2-4. 

The  function  F  obtained  by  projecting  the  distributed  interval  extension  on  variable  Xi  is 

F{X)  =  X^-  [0, 1]X  -  4. 
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The  corresponding  functions  //  and  /„  are 

/„(x)  =  -4 

//(x)  =  x^  —  X  —  4. 

Their  natural  interval  extensions  are  of  course 

F„(X)  =  X2-4 
Fi(X)  =  X2-X-4. 

The  narrowing  operator  can  now  be  obtained  by  using  interval  Newton  method  on  the  above 
two  functions.  Some  care  must  be  applied  obviously  since  //  and  are  not  differentiable  at  0.  The 
method  is  more  efficient  than  applying  the  interval  Newton  method  on  the  interval  function,  since 
intervals  have  been  replaced  by  numbers  increasing  the  precision  of  the  Newton  operator  N*.  We 
now  present  the  narrowing  operator. 

Definition  14  [Narrowing  Operator  for  the  Distributed  Interval  Extension]  Let  (/  =  0,  i)  be  a 
projection  constraint,  let  F,  fi,  and  /„  be  the  functions 

Fix)  =  7(/i,...,/._i,X,/i+i,...,/„) 

flix)  =  leftiFix)) 

/u(x)  =  right{F{x)) 

Fi  =7 
Fu  =  fu 

F!  =  I 

FL  =  fu 

I  =  leftili) 

u  =  right  (li) 

Assuming  that  (/i, . .  .,/n)  is  not  empty,  the  narrowing  operator  for  the  distributed  interval  exten¬ 
sion  is  defined  as  follows; 

BOX-NARROW-DEC {/  =  0,  i),  (/i, . . . ,  /„))  =  (/i,  •  •  • ,  li-i,  [k,  ««],  li+i,  •  •  • ,  ^n) 
where 

/  if  0  6  F([/,  /+]) 

Ze/t(LNAR(Fj,  F/,  [/,  Oj)  U  LNAR(F,,  F/,  [0, «]))  if  F([Z,  /+])  >  0 
Ze/t(LNAR(F„,  F',  [/,  0])  U  LNAR(Fu,  F',  [0, «]))  otherwise 
u  if  0  e  F([m-,u]) 

rightiMkK{Fi,  F/,  [/,  0])  U  RNAR(F/,  F/,  [0,  u]))  if  F([/,  /+])  >  0 
right{MkRiFu,  F' ,  [/,  0])  U  RNAR(F„,  F^,  [0,  u]))  otherwise . 

5.4  Taylor  Interval  Extension 

We  conclude  by  presenting  the  narrowing  operator  for  the  Taylor  interval  extension.  Box-consistency 
on  the  Taylor  interval  extension  can  be  enforced  by  using  the  interval  function 

F{X)  =  ^  X,  7i+i, . . . ,  In) 
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and  by  applying  a  simple  implementation  of  LNAR  and  RNAR  where  the  Newton  operator  N*  is 
omitted.  However,  it  is  possible  to  do  better  by  noticing  that  the  constraint  is  of  the  form 


1-1 


/(mi, . . m„)  +  ^  |j(/)(/j  -  nij)  + 

j=i 


j{h,...Jn){Xi-rfH)+  Yl 
i=*+i 


where  mi  =  center{Ii)  and  contains  a  single  variable  which  can  be  isolated  to  compute  box- 
consistency  directly. 


Definition  15  [Narrowing  Operator  for  the  Taylor  Interval  Extension]  The  narrowing  operator  for 
the  Taylor  interval  extension  is  defined  as  follows: 

BOX-NARROH-TE((/  =  0,  i),  (A,  ,  /.-i,  /,  li+i,  ...Jn) 


where 


/  =  /,•  n  (m,-  - 


^(/i , . . . , 


and  m,-  =  center  (I,). 


5.5  Existence  of  Solution 

We  now  briefly  describe  how  Newton  proves  the  existence  of  solutions.  No  special  effort  has  been 
devoted  to  this  topic  and  the  techniques  could  certainly  be  improved  in  various  ways.  Let  {fi  = 
0,  ...,/„  =  0}  be  a  conditioned  system  of  equations  over  variables  {xi , . . . ,  a;„},  let  (/i , . . . ,  In)  be 
a  box  and  define  the  intervals  //  (1  <  i  <  «)  as  follows: 


and  mi  —  center (/,).  If 


then  there  exists  a  unique  zero  in  (7(, , . . ,  A  proof  of  this  result  can  be  found  in  [26]  where  credit 
is  given  to  Moore  and  Nickel.  Note  also  the  intervals  I[  have  to  be  computed  for  box-consistency 
on  the  Taylor  interval  extension. 


5.6  Implementation  Issues 

We  now  review  some  implementation  issues  which  arise  in  programming  the  algorithm. 

Priorities  In  the  pruning  algorithm,  it  is  important  for  efficiency  reasons  to  use  a  priority  queue 
to  ensure  that  projections  over  the  distributed  interval  extension  be  selected  before  projections 
over  the  natural  interval  extension.  Newton  also  does  not  enforce  box-consistency  on  the  distributed 
version  whenever  it  is  believed  to  lose  too  much  precision  (e.g.,  an  expression  raised  to  some  power). 
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Precision  In  practice,  it  is  often  sufficient  to  return  intervals  whose  widths®  are  within  the  desired 
accuracy  instead  of  returning  intervals  of  the  form  [I,  /■*■].  It  is  easy  to  modify  the  BRANCH  operation 
to  split  only  intervals  whose  widths  is  above  the  required  accuracy.  Our  system  allows  users  to 
specify  the  accuracy. 

Improvement  Factor  Box-consistency  can  sometimes  take  much  time  to  remove  small  parts  of 
the  intervals.  In  these  cases,  it  is  probably  more  cost-effective  to  branch.  Once  again,  it  is  easy 
to  modify  the  algorithm  to  avoid  this  problem  by  making  sure  that  the  narrowing  operators  do 
not  update  the  intervals  unless  some  significant  reduction  has  taken  place.  Since  the  notion  of 
significant  reduction  may  be  problem-dependent,  our  system  lets  users  specify  the  improvement 
factor  necessary  to  update  an  interval  in  a  projection. 

Automatic  Differentiation  As  mentioned,  our  algorithm  takes  a  very  simple  approach  to  obtain 
partial  derivatives,  i.e.,  no  effort  is  spent  in  factoring  common  expressions  to  reduce  the  dependency 
problem  of  interval  arithmetic.  The  main  reason  comes  from  the  fact  that  we  are  using  automatic 
differentiation  [28]  to  evaluate  the  derivatives  together  with  the  functions.  This  choice  may  be 
reconsidered  in  a  further  version  of  the  system. 

Inequalities  It  is  simple  to  generalize  the  above  algorithms  for  inequalities.  In  general,  it  suffices 
to  test  if  the  inequality  is  satisfied  at  the  end  points  of  the  interval.  If  it  is  not,  then  the  problem 
reduces  once  again  to  finding  the  leftmost  and/or  rightmost  zeros. 

6  Experimental  Results 

This  section  reports  experimental  results  of  Newton  on  a  variety  of  standard  benchmarks.  The 
benchmarks  were  taken  from  papers  on  numerical  analysis  [23],  interval  analysis  [8,  11,  22],  and 
continuation  methods  [35,  25,  24,  18].  ^Ve  also  compare  Newton  with  a  traditional  interval  method 
using  Hansen-Segupta’s  operator,  range  testing,  and  branching.  This  method  uses  the  same  im¬ 
plementation  technology  as  Newton  and  is  denoted  by  HRB  in  the  following.®  Finally,  we  compare 
Newton  with  a  state-of-the-art  continuation  method  [35],  denoted  by  CONT  in  the  following.  Note 
that  all  results  given  in  this  section  were  obtained  by  running  Newton  on  a  Sun  Sparc  10  work¬ 
station  to  obtain  all  solutions.  In  addition,  the  final  intervals  must  have  widths  smaller  than  10"® 
and  Newton  always  uses  an  improvement  factor  of  10%.  The  results  are  summarized  in  Table  1. 
For  each  benchmark,  we  give  the  number  of  variables  (n),  the  total  degree  of  the  system  (d),  the 
initial  range  for  the  variables,  and  the  results  of  each  method  in  seconds.  Note  that  the  times  for 
the  continuation  method  are  on  a  DEC  5000/200.  A  space  in  a  column  means  that  the  result  is 
not  available  for  the  method.  A  question  mark  means  that  the  method  does  not  terminate  in  a 
reasonable  time  (>  1  hour).  The  rest  of  the  section  describes  each  benchmark  and  the  results  in 
much  more  detail.  For  each  benchmark,  we  report  the  CPU  times  in  seconds,  the  growth  of  the 
CPU  time,  the  number  of  branch  operations  branching,  the  number  of  narrowings  on  the  various 

®The  width  of  [/,  w]  is  w  — 

^Some  interval  methods  such  as  [7]  are  more  sophisticated  than  HRB  but  the  sophistication  aims  at  speeding  up 
the  computation  near  a  solution.  Our  main  contribution  is  completely  orthogonal  and  aims  at  speeding  up  the 
computation  when  far  from  a  solution  and  hence  comparing  it  to  HRB  is  meaningful. 
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Benchmarks 

■■ 

d 

range 

HRB 

CONT 

Broyden 

10 

[-1,1] 

1.65 

18.23 

Broyden 

20 

o 

CO 

[-1,1] 

4.25 

7 

Broyden 

320 

3320 

[-1,1] 

113.71 

? 

Broyden 

320 

3320 

[-10®,  10®] 

143.40 

? 

More-Cosnard 

20 

320 

[-4,5] 

24.49 

968.25 

More-Cosnard 

40 

CO 

0 

[-4,5] 

192.81 

7 

More-Cosnard 

80 

380 

[-4,5] 

1752.64 

7 

More-Cosnard 

80 

0 

00 

CO 

[-10®,  0] 

1735.09 

7 

il 

10 

310 

[-2,2] 

0.06 

14.28 

i2 

20 

320 

[-1,2] 

0.30 

1821.23 

i3 

20 

320 

[-2,2] 

0.31 

5640.80 

14 

10 

glO 

[-1,1] 

73.94 

445.28 

15 

10 

llio 

[-1,1] 

0.08 

33.58 

klnl 

12 

4608 

[-10®,  10®] 

14.24 

1630.08 

kln2 

8 

256 

[-10®,  10®] 

353.06 

4730.34 

35.61 

eco 

4 

18 

[-10®,  10®] 

0.60 

2.44 

1.13 

eco 

5 

54 

[-10®,  10®] 

3.35 

29.88 

5.87 

eco 

6 

162 

[-10®,  10®] 

22.53 

? 

50.18 

eco 

7 

486 

[-10®,  10®] 

127.65 

? 

991.45 

eco 

8 

1458 

[-10®,  10®] 

915.24 

? 

eco 

9 

4374 

[-10®,  10®] 

8600.28 

? 

combustion 

10 

96 

[-10®,  10®] 

9.94 

? 

57.40 

chemistry 

5 

108 

[0, 10®] 

6.32 

? 

56.55 

neuro 

6 

1024 

[-10, 10] 

0.91 

28.84 

5.02 

neuro 

6 

1024 

[-1000,1000] 

172.71 

? 

5.02 

Table  1:  Summary  of  the  Experimental  Results 


extensions  na-ne^  na^ee^  na-te^  the  total  number  of  narrowings  na-tot^  the  number  of  function  eval¬ 
uations  (including  evaluation  of  derivatives  which  are  counted  as  normal  function  evaluations)  for 
each  of  the  extensions  /e-ne,  /e-ee,  fe-te  and  the  total  number  of  function  evaluations  fe-tot  We 
also  indicate  the  number  of  preconditionings  by  pr-con  and  whether  the  algorithm  can  prove  the 
existence  of  the  solutions  in  the  resulting  intervals  by  proof. 

6.1  Broyden  Banded  Functions 

This  is  a  traditional  benchmark  of  interval  techniques  and  was  used  for  instance  in  [7].  It  consists 
in  finding  the  zeros  of  the  functions 

fi  {xi,...,x„)  =  Xi{2  +  5a;?)  +  1  -  Eje  J.-  (1  +  ®i)  (1  <  »  ^  ”) 

where  Ji  =  {j  \  j  i  k  max{\,  i-b)  <j  <  min{n,  i  +  1)}.  One  of  the  interesting  features  of 
this  benchmark  is  that  it  is  easy  to  scale  up  to  an  arbitrary  dimension  and  hence  provides  a  good 
basis  to  compare  various  methods.  Table  2  reports  the  results  of  our  algorithm  for  various  sizes 
assuming  initial  intervals  [—1,1]. 

The  results  indicate  that  Newton  solves  the  problem  using  only  constraint  propagation:  no 
branching  is  needed.  In  addition,  the  growth  of  the  computation  times  is  very  low  and  indicates 
that  Newton  is  essentially  linear  and  can  thus  solve  very  large  instances  of  this  problem.  Finally, 
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5 

10 

20 

40 

80 

160 

320 

CPU  time 

0.20 

1.65 

4.25 

9.79 

22.13 

48.30 

113.71 

growth 

8.25 

2.57 

2.30 

2.26 

2.18 

2.35 

branching 

0 

0 

0 

0 

0 

0 

0 

na-ne 

57 

260 

661 

1607 

4351 

8096 

17126 

na-ee 

1226 

8334 

21236 

48540 

102797 

206926 

414798 

na-te 

35 

no 

260 

560 

1200 

2400 

4480 

na-tot 

1318 

8704 

22157 

50707 

108348 

217422 

436404 

fe-ne 

81 

1828 

2943 

5103 

11993 

20431 

42125 

fe-ee 

3462 

21518 

53722 

121984 

257398 

517640 

1036210 

fe-te 

95 

320 

920 

2720 

8800 

30400 

111360 

f e-tot 

3638 

23666 

57585 

129807 

278191 

568471 

1189695 

pr-con 

0 

0 

0 

0 

0 

0 

0 

proof 

yes 

yes 

yes 

yes 

yes 

yes 

yes 

Table  2:  Newton  on  the  Broyden  Banded  functions  with  initial  intervals  [-1, 1] 


5 

10 

20 

40 

80 

160 

320 

CPU  time 

0.31 

2.15 

5.09 

12.49 

27.69 

61.60 

143.40 

growth 

6.93 

2.36 

2.45 

2.21 

2.22 

2.32 

branching 

0 

0 

0 

0 

0 

0 

0 

pr-con 

0 

0 

0 

0 

0 

0 

0 

proof 

yes 

yes 

yes 

yes 

yes 

yes 

yes 

Table  3:  Newton  on  the  Broyden  Banded  functions  with  initial  intervals  [-10®,  10®] 


Newton  proves  the  existence  of  a  solution  in  the  final  intervals.  To  our  knowledge,  no  other  algorithm 
has  all  these  functionalities.  Table  3  shows  the  same  results  when  the  initial  intervals  are  [-10®,  10®]. 
They  indicate  that  the  CPU  time  increases  only  slightly  in  this  problem  when  the  initial  intervals 
become  substantially  larger.  It  is  interesting  to  note  that  substantial  pruning  is  obtained  by  box- 
consistency  on  the  natural  and  distributed  extensions  alone.  For  n  =  10,  maximal  box-consistency 
on  these  two  extensions  produces  the  intervals 


[- 

[- 

[- 

[- 

[- 

[- 

[- 

[- 

[- 

[- 


0.4283028737061274627,-0.4283028534683728794] 

0.4765964317901201786,-0.4765964169224605195] 

0.5196524683730758821,-0.5196524589206473754] 

0.5580993358758108425,-0.5580993137885511545] 

0.5925061654931400579,-0.5925061481657747375] 

0.6245036923913307448,-0.6245036720076052594] 

0.6232394806883442274,-0.6232394621928379896] 

0.6213938520278742273,-0.6213938315652728361] 

0.6204536054436834425,-0.6204535878744913413] 

0.5864692773020701023,-0.5864692641387999616] 


which  have  widths  lower  than  10“®.  Note  that  the  Hansen-Segupta’s  operator  alone  does  not  pro¬ 
duce  any  pruning  initially  and  returns  the  initial  intervals  whether  they  be  of  the  form  [-10®,  -|-10®] 
or  [-1,1].  This  indicates  that  box-consistency  on  the  natural  and  distributed  interval  extensions 
are  particularly  eflfective  when  far  from  a  solution  while  box-consistency  on  the  Taylor  extension 
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5 

10 

20 

40 

80 

CPU  time 

0.75 

4.07 

24.49 

192.81 

1752.64 

growth 

5.42 

6.02 

7.87 

9.08 

branching 

0 

0 

0 

0 

_ 

0 

na-ne 

3663 

12616 

46555 

213949 

1236532 

na-te 

104 

255 

505 

1005 

2807 

na-tot 

3767 

12871 

47060 

214954 

1239339 

fe-ne 

8775 

31837 

107977 

466595 

2586907 

fe-te 

884 

3111 

11211 

42411 

166415 

fe-tot 

9659 

34948 

119188 

509006 

2753322 

fe-grow 

3.62 

3.41 

4.27 

5.40 

pr-con 

1 

1 

1 

1 

1 

proof 

yes 

no 

no 

no 

no 

Table  4;  Newton  on  the  More-Cosnard  nonlinear  integral  Equation  with  initial  intervals  in  [—4,  5] 


5 

10 

20 

40 

80 

CPU  time 

0.70 

3.82 

20.81 

189.94 

1735.09 

growth 

5.45 

5.44 

9.12 

9.13 

branching 

0 

0 

0 

0 

0 

pr-con 

0 

0 

0 

0 

0 

proof 

yes 

no 

no 

no 

no 

Table  5:  Newton  on  the  More-Cosnard  nonlinear  integral  Equation  with  initial  intervals  in  [-10®, 
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(and  the  Hansen-Segupta’s  operator)  is  effective  when  near  a  solution. 

It  is  also  interesting  to  stress  the  importance  of  box-consistency  on  the  natural  extension  in  this 
example  to  reduce  the  growth  factor.  Without  it,  the  algorithm  takes  about  48  and  440  seconds 
instead  of  27  and  61  for  Newton  for  n  =  80  and  n  =  160,  since  the  distributed  interval  extension 
loses  precision  due  to  the  dependency  problem. 

Finally,  it  is  interesting  to  compare  Newton  with  traditional  interval  methods.  HRB  takes  0.34 
seconds  on  n  =  5  with  18  branchings,  about  18  seconds  for  n  =  10  with  about  300  branchings,  and 
does  not  return  after  more  than  an  hour  on  n  =  20. 

6.2  Discretization  of  a  Nonlinear  Integral  Equation 

This  example  comes  from  [23]  and  is  also  a  standard  benchmark  for  nonlinear  equation  solving  .  It 
consists  in  finding  the  root  of  the  functions  . .  -^Xm)  (1  ^  ^  ^  defined  as 

^  k  m 

"  ija  -  4)  + 1)" + E  (1  - + 1)'] 

where  tj  =  jh  and  h  =  l/(m+  1).  These  functions  come  from  the  discretization  of  a  nonlinear 
integral  equation,  giving  a  constraint  system  denser  than  the  sparse  constraint  system  for  the 
Broyden  banded  functions.  The  variables  a:,-  were  given  initial  domains  [-4, 5]  as  in  [29]  and  the 
computation  results  are  given  in  Table  4. 

Once  again,  it  is  interesting  to  note  that  Newton  is  completely  deterministic  on  this  problem, 
i.e.,  it  does  not  do  any  branching.  Newton  is  probably  cubic  in  the  number  of  variables  for  this 
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5 

IHEI 

BHKI 

80 

CPU  time 

wkifim 

? 

H 

branching 

5 

24 

mol 

? 

B 

fe-tot 

20194 

1285764 

■1 

pr-con 

7 

32 

m 

proof 

no 

no 

7 

H 

Table  6:  The  HRB  algorithm  on  the  More-Cosnard  nonlinear  integral  Equation  with  initial  intervals 
in  [-4, 5] 


problem.  It  is  important  to  point  out  the  critical  role  of  box-consistency  on  the  natural  extension 
to  solve  this  problem  efficiently.  Newton  without  the  natural  extension  would  not  be  deterministic 
and  would  slow  down  exponentially,  since  box-consistency  on  the  distributed  extension  loses  too 
much  precision  due  to  the  dependency  problem  (multiple  occurrences  of  the  same  variable)  and  box- 
consistency  on  the  Taylor  interval  extension  is  not  helpful  initially.  Once  again,  we  observe  that 
box-consistency  over  the  natural  extension  is  helpful  when  far  from  a  solution  while  box-consistency 
on  the  Taylor  extension  is  useful  to  terminate  the  search  quickly.  Table  5  gives  the  result  for  the 
initial  intervals  of  size  [-10*,  0],  which  shows  that  the  algorithm  continues  to  perform  well  in  this 
case.  Finally,  Table  6  gives  the  results  for  the  HRB  algorithm  on  this  problem.  Once  again,  Newton 
outperforms  the  HRB  method  substantially. 

6.3  Interval  Arithmetic  Benchmarks 

This  section  considers  standard  benchmarks  from  interval  arithmetic  papers  [21,  11].  Benchmark 
il  is  the  following  set  of  equations 

'  0  =  *1  -  0.25428722  -  0.18324757  *4  *3 
0  =  X2  -  0.37842197-0.16275449  xi  xio  xq 
0  =  *3  -  0.27162577-0.16955071  xi  *2  iJio 
0  =  X4  -  0.19807914  -  0.15585316  xt  xi  xe 
0  =  X5  -  0.44166728  -  0.19950920  xt  xe  X3 
^  0  =  X6  -  0.14654113  -  0.18922793  xg  X5  xio 
0  =  X7  -  0.42937161  -  0.21180486  X2  X5  xg 
0  =  xg  -  0.07056438  -  0.17081208  xi  X7  xg 
0  =  X9  -  0.34504906  -  0.19612740  xio  xg  xg 
0  =  xio  -  0.42651102  -  0.21466544  X4  xg  xi 
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with  initial  intervals  [—2,2].  Benchmark  i2  is  the  set  of  equations 


0  =  a;i  -  0.24863995  -  0.19594124  xr  a;io  aiie 
0  =  12  -  0.87528587  -  0.05612619  xis  xs  xn 
0  =  xa  -  0.23939835  -  0.20177810  a;io  xj  aru 
0  =  0:4  -  0.47620128  -  0.16497518  xu  *15  xi 
0  =  X5-  0.24711044-  0.20198178  xg  xg  xie 
0  =  X6  -  0.33565227  -  0.15724045  xie  xig  xn 
0  =  X7  -  0.13128974-0.12384342  X12  xia  xis 
0  =  Xg  —  0.45937304  —  0.18180253  xig  X15  xjg 
0  =  Xg  -  0.46896600  -  0.21241045  X13  X2  X17 
0  =  xio  -  0.57596835  -  0.16522613  X12  xg  X13 
0  =  xn  -  0.56896263-0.17221383  xjg  xn  xg 
0  =  X12  -  0.70561396  -  0.23556251  xu  a^ii  *4 
0  =  xi3  -  0.59642512  -  0.24475135  X7  xis  X20 
0  =  xi4  -  0.46588640  -  0.21790395  X13  X3  xio 
0  =  X15  -  0.10607114-0.20920602  xi  xg  xio 
0  =  X16  -  0.26516898  -  0.21037773  X4  xig  xg 
0  =  xi7  —  0.20436664  —  0.19838792  X20  xio  X13 
0  =  xig  -  0.56003141  -  0.18114505  xg  X13  xg 
0  =  xig  -  0.92894617-0.04417537  X7  X13  xie 
0  =  X20  -  0.57001682-0.17949149  xi  X3  xn 


with  initial  intervals  [—1,2].  Benchmark  i3  has  the  same  set  of  equations  as  i2  but  has  initial 
intervals  [-2, 2].  Benchmark  i4  has  the  set  of  equations 

'  0  =  xf  -  0.25428722  -  0.18324757  x|  xl  x| 

0  =  x^  -  0.37842197-0.16275449  xf  x\q  x| 

0  =  x|  -  0.27162577-  0.16955071  x\  xl  x?o 
0  =  x|  -  0.19807914-0.15585316  x|  x\  x| 

0  =  xi  -  0.44166728  -  0.19950920  x?  xi  xg 
^  0  =  xi  -  0.14654113  -  0.18922793  x|  xl  x\q 
0  =  x|  -  0.42937161  -  0.21180486  xl  xf  xg 
0  =  x|  -  0.07056438  -  0.17081208  x\  x?  xi 
0  =  xi  -  0.34504906  -  0.19612740  xjo  xi  xi 
0  =  xio  -  0.42651102-0.21466544  x|  xg  x\ 


and  initial  intervals  [—1,  !]•  The  number  of  solutions  must  be  a  multiple  of  1024.  Benchmark  i5 
has  the  following  set  of  equations 

'  0  =  xi -0.25428722- 0.18324757  x^xixi  +  x|  xi 
0  =  X2  -  0.37842197-  0.16275449  x?  x?o  xi  +  x^o  4 
0  =  X3  -  0.27162577-0.16955071  xf  x^  xfo  +  x\  x\q 
0  =  X4  -  0.19807914-0.15585316  x?  x?  xg  +  x\  xl 

0  =  X5  -  0.44166728-0.19950920  x?  xi  x|  +  x|  xi 

^  0  =  Xg  -  0.14654113-  0.18922793  x|  xf  x?o  +  x|  x\q 
0  =  X7  -  0.42937161  -  0.21180486  xf  xf  x|  +  xt  xl 

0  =  Xg  -  0.07056438-0.17081208  x\  xf  xi  +  xf  xl 

0  =  Xg  -  0.34504906-0.19612740  xfo  xi  xf  +  x|  xi 
0  =  Xio  -  0.42651102  -  0.21466544  xf  x|  xf  +  x|  x\ 
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il 

i2 

i3 

i4 

i5 

CPU  time 

0.06 

0.30 

0.31 

73.94 

0.08 

branching 

0 

0 

0 

1023 

0 

na-ee 

625 

2107 

1949 

290776 

381 

na-te 

0 

80 

80 

37930 

21 

na-tot 

625 

2187 

2029 

328706 

401 

fe-ee 

1760 

5698 

5318 

752220 

992 

fe-te 

0 

560 

560 

269760 

140 

f e-tot 

1760 

6258 

5878 

1021980 

1132 

pr-con 

0 

1 

1 

1939”] 

1 

proof 

yes 

yes 

yes 

yes 

yes 

Table  7:  Newton  on  the  Traditional  Interval  Arithmetic  Benchmarks 


il 

±2 

i3 

i4 

i5 

CPU  time 

14.28 

1821.23 

5640.80 

445.28 

33.58 

branching 

498 

9031 

36933 

11263 

1173 

f e-tot 

6441640 

19979025 

2554066 

154948 

pr-con 

12817 

42193 

14335 

1211 

proof 

mm 

yes  ' 

yes 

yes 

yes 

Table  8:  HRB  on  the  Traditional  Interval  Arithmetic  Benchmarks 


and  initial  intervals  [-1,1]. 

Newton  solves  all  the  problems  with  one  solution  without  branching  and  solves  the  problem 
having  1024  solutions  with  1023  branchings.  Note  also  that  box-consistency  on  the  distributed 
extension  solves  benchmark  il  alone.  The  results  once  again  confirm  our  observation  on  when  the 
various  extensions  are  useful.  Closely  related  results  were  observed  in  [11]  on  these  benchmarks 
(see  the  related  work  section  for  a  more  detailed  comparison)  but  our  algorithm  is  in  general 
about  4  times  faster  (assuming  similar  machines)  and  does  not  do  any  branching  on  15.  Table 
8  also  describes  the  results  for  the  traditional  interval  arithmetic  method.  The  importance  of 
box-consistency  on  the  distributed  extension  can  easily  be  seen  from  these  results.  Note  also  that 
Newton  (and  interval  methods)  can  prove  the  existence  of  a  solution  in  the  final  intervals  for  all 
these  problems. 

It  is  also  interesting  to  note  that  problem  i4  can  be  solved  dramatically  more  efficiently  simply 
by  introducing  intermediary  variables  yi  =  The  excution  times  then  dropped  to  less  than  0.5 
seconds. 

6.4  Kinematics  Applications 

We  now  describe  the  performance  of  Newton  on  two  kinematics  examples.  Application  kinl  comes 
from  robotics  and  describes  the  inverse  kinematics  of  an  elbow  manipulator  [11].  It  consists  of  a 
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-  0.249150680 
+  1.609135400 
+  0.279423430 
+  1.434801600 
+  0.000000000 
+  0.400263840 

-  0.800527680 
+  0.000000000 
+  0.074052388 

-  0.083050031 

-  0.386159610 

-  0.755266030 
+  0.504201680 

-  1.091628700 
+  0.000000000 
+  0.049207290 
+  0.049207290 


+  0.125016350 

-  0.686607360 

-  0.119228120 

-  0.719940470 

-  0.432419270 
+  0.000000000 
+  0.000000000 

-  0.864838550 

-  0.037157270 
+  0.035436896 
+  0.085383482 
+  0.000000000 

-  0.039251967 
+  0.000000000 

-  0.432419270 
+  0.000000000 
+  0.013873010 


-  0.635550070 

-  0.115719920 

-  0.666404480 
+  0.110362110 
+  0.290702030 
+  1.258776700 

-  0.629388360 
+  0.581404060 
+  0.195946620 

-  1.228034200 
+  0.000000000 

-  0.079034221 
+  0.026387877 

-  0.057131430 

-  1.162808100 
+  1.258776700 
+  2.162575000 


+  1.48947730 
+  0.23062341 
+  1.32810730 

-  0.25864503 
+  1.16517200 

-  0.26908494 
+  0.53816987 
+  0.58258598 

-  0.20816985 
+  2.68683200 

-  0.69910317 
+  0.35744413 
+  1.24991170 
+  1.46773600 
+  1.16517200 
+  1.07633970 

-  0.69686809 


Table  9:  Coefficients  for  the  Inverse  Kinematics  Example. 


sparse  system  with  12  variables  and  the  set  of  equations  is  as  follows: 

S2C5S6  -  ssc^se  -  S4C5S6  +  C2C6  +  C3C6  +  C4C6  =  0.4077 
C1C2S5  +  C1C3S5  +  C1C4S5  +  S1C5  =  1.9115 
S2S5  +  S3S5  4-  S4S5  =  1.9791 
^  C1C2  +  C1C3  +  C1C4  +  C1C2  +  C1C3  +  C1C2  =  4.0616 
S1C2  +  S1C3  +  S1C4  +  S1C2  +  S1C3  +  S1C2  =  1.7172 
“I"  *3  "b  4"  ^2  +  ^3  4"  ^2  —  3.9701 
s?  +  cf  =  1  (1  <  *  <  6). 

The  second  benchmark,  denoted  by  kin2,  is  from  [24]  and  describes  the  inverse  position  problem 
for  a  six-revolute-joint  problem  in  mechanics.  The  equations  which  describe  a  denser  constraint 
system  are  as  follows: 

(  x]  +  -1  =  0  (1  <  i  <  4) 

<  ai,xiX3  -|-  C2,a;ia;4  -f-  a3iX2Xz  4-  a4iX2X4  +  a^ix^x-j  -t-  a^ix^xs  +  ajiXQXy  +  asix^xs 

[  agiXi  -f-  aio,X2  4-  amxs  +  o,i2iX4  +  ai3iX5ai4iXe  -f-  ai^iXr  +  aieiXs  -b  am  =  0  (1  <  i  <  4) 

where  the  coefficients  aki  are  given  in  table  9.  In  both  examples,  the  initial  intervals  were  given  as 

The  results  of  Newton  on  these  two  benchmarks  are  given  in  Table  10.  Newton  is  fast  on  the 

first  benchmark  and  does  not  branch  much  to  obtain  all  solutions.  The  algorithm  in  [11]  branches 

more  (the  reported  figure  is  257  branches  but  it  is  not  really  comparable  due  to  the  nature  of  the 
algorithm)  and  is  about  16  times  slower  on  comparable  machines.  We  are  not  aware  of  the  results 
of  continuation  methods  on  this  problem.  Newton  is  slower  on  the  second  application  and  takes 
about  6  minutes.  The  continuation  method  described  in  [35]  requires  about  30  seconds  on  a  DEC 
5000/200.  This  method  exploits  the  fact  that  the  Newton  polytopes  for  the  last  4  equations  are 
the  same.  Note  that  HRB  requires  about  1630  and  4730  seconds  on  these  examples.  Note  also  that 
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kinl 

kin2 

CPU  time 

14.240 

353.06 

branching 

89 

5693 

na-€€ 

17090 

784687 

na-te 

10176 

123032 

na-tot 

27266 

907719 

fe-ee 

45656 

1714779 

fe-te 

62080 

854384 

f e-tot 

107736 

2569163 

pr-con 

163 

9505 

proof 

yes 

yes 

Table  10:  Newton  on  the  Kinematics  Benchmarks 


4 

5 

6 

7 

8 

9 

CPU  time 

0.21 

1.22 

8.20 

46.59 

352.80 

3311.42 

growth 

5.80 

6.72 

5.68 

7.57 

9.38 

branching 

24 

119 

517 

2231 

12248 

82579 

na-ee 

834 

4701 

42481 

214430 

1622417 

14031838 

na-te 

480 

1976 

6325 

29238 

157402 

1219960 

na-tot 

1314 

6677 

48806 

243668 

1779819 

15251798 

fe-ee 

2220 

116628 

99994 

489894 

3626847 

30782836 

fe-te 

1293 

6304 

29825 

160284 

1062789 

9118448 

f e-tot 

3513 

17932 

129819 

650178 

4689636 

39901284 

pr-con 

37 

147 

687 

2828 

15265 

104352 

proof 

yes 

yes 

yes 

yes 

yes 

yes 

Table  11:  Newton  on  the  economics  modelling  problem  with  initial  intervals  in  [-100, 100] 


Newton  can  prove  the  existence  of  solutions  in  the  final  intervals  for  these  problems  and  that  our 
computer  representation  uses  intermediate  variables 

^13  =  2:4  +  0^6  +  2^8 

^  Xi4  =  0:3  +  +  2^7 

2>15  “  2/4  2^6  273 

X\Q  =  3  *  3)4  +  2  *  Xe  +  2^8 

to  improve  efficiently  slightly  in  the  first  problem. 

6.5  An  Economics  Modelling  Application 

The  following  example  is  taken  from  [25].  It  is  a  difficult  economic  modelling  problem  that  can 
be  scaled  up  to  arbitrary  dimensions.  For  a  given  dimension  n,  the  problem  can  be  stated  as  the 
system 

[  -  Cfc  =  0  (1  <  fc  <  n  -  1) 

1  Er=i'^'+i=o 

and  the  constants  can  be  chosen  at  random. 

Table  11  reports  the  results  for  various  values  of  n  with  an  initial  interval  of  [-100, 100].  It  is 
interesting  to  compare  those  results  with  the  continuation  methods  presented  in  [35].  [35]  reports 
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4 

5 

6 

7 

8 

9 

CPU  time 

0.60 

3.35 

22.53 

127.65 

915.241 

8600.28 

growth 

5.58 

6.72 

5.66 

7.16 

9.39 

branching 

102 

500 

1778 

7527 

38638 

244263 

na-ee 

2689 

15227 

122662 

606805 

4413150 

36325819 

na-te 

978 

3296 

14055 

64632 

366436 

2647016 

na-tot 

3667 

18523 

136717 

671437 

4779586 

38972835 

/e-ee 

7140 

38160 

291206 

1400216 

9913850 

80264897 

fe-te 

3288 

15024 

80110 

429396 

2825368 

22672208 

fe-tot 

10428 

53184 

371716 

1829612 

12739218 

102937105 

pr-con 

148 

527 

2080 

8337 

42704 

271534 

proof 

yes 

yes 

yes 

yes 

yes 

yes 

Table  12:  Newton  on  the  economics  modelling  problem  with  initial  intervals  in  [—10*,  10*] 

times  (on  a  DEC-5000/200)  of  about  1  second  for  ra  =  4,  6  seconds  for  ra  =  5,  50  seconds  for  n  =  6 
and  990  seconds  for  n  =  7.  Newton  is  substantially  faster  on  this  problem  than  this  continuation 
method,  since  it  takes  about  47  seconds  for  n  —  1.  More  importantly,  the  growth  factor  seems 
much  lower  in  Newton.  The  continuation  method  has  growths  of  about  8  and  20  when  going  from  5 
to  6  and  6  to  7,  while  Newton  has  growths  of  about  6.72  and  5.68.  Table  12  gives  the  same  results 
for  initial  intervals  in  [-10*,  10*].  It  is  interesting  to  note  that  the  computation  times  increase  by 
less  than  a  factor  3  and  that  the  growth  factor  is  independent  of  the  initial  intervals.  Note  also 
that  Newton  can  establish  the  existence  of  solutions  for  these  problems.  Finally,  it  is  worthwhile 
stating  that  the  results  were  obtained  for  a  computer  representation  where  x„  has  been  eliminated 
in  a  problem  of  dimension  n. 

6.6  Combustion  Application 

This  problem  is  also  from  Morgan’s  book  [25]  and  represents  a  combustion  problem  for  a  tempara- 
ture  of  3000°.  The  problem  is  described  by  the  following  sparse  systems  of  equations 

X2 2  Xg Xg -\- 2  Xio  =  10“® 

2:3  +  a;8  =  3  10“^ 

+  2:3  +  2  xs  -I-  2  xs  +  2:9  +  a:io  =  5  10"* 

X4  -f-  2  X7  =  10  ^ 

0.5140437  10“'^  X5  =  xf 
^  0.1006932  10"*  X6  =  2xI 
0.7816278  10-^®  x?  =  xl 
0.1496236  10-*  xs  =  xi  X3 
0.6194411  10"'^  X9  =  Xl  X2 
.  0.2089296  10-^"^  xio  =  xi  x^ 

which  is  typical  of  chemical  equilibrium  systems.  Table  13  describes  the  results  of  Newton  on  for  the 
initial  intervals  [-1, 1]  and  [-10*,  10*].  Newton  behaves  well  on  this  example,  since  the  continuation 
method  of  [35]  takes  about  57  seconds.  Note  once  again  that  a  substantial  increase  in  the  size  of 
the  initial  intervals  only  induces  a  slowdown  of  about  2.5  for  Newton.  Note  also  that  Newton  can 
prove  the  existence  of  the  solutions  and  that  we  use  a  formulation  where  variable  X7  and  X3  have 
been  eliminated. 
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r 


[-1,1] 

[-10*,10«] 

CPU  time 

4.06 

9.94 

branching 

183 

523 

na-ee 

10676 

28473 

na-te 

5248 

7952 

na-tot 

15924 

36425 

fe-ee 

28928 

77508 

fe-te 

22464 

49632 

f e-tot 

51392 

127140 

pr-con 

187 

527 

proof 

yes 

yes 

Table  13:  Newton  on  the  Combustion  Problem. 


CPU  time 

branching 

na-ee 

na-te 

na-tot 

fe-ee 

fe-te 

fe-tot 

pr-con 

proof 

6.32 

256 

13725 

3400 

17125 

34811 

17425 

52236 

425 

yes 

Table  14:  Newton  on  the  Chemestry  Problem  with  Initial  Intervals  [0, 10*] 


6.7  Chemical  Equilibrium  Application 

This  originates  from  [18]  and  describes  a  chemical  equilibrium  system.  The  set  of  equations 

is  as  follows: 

'  i?  =  10 
R5  =  0.193 
Re  =  0.002597/\^ 

R-r  =  0.003448/V50 
Re  =  0.00001799/40 
Rg  =  0.0002155/V^ 

^  iJio  0.00003846/40 
Xi  X2  +  aq  -  3  xs  =  0 

2  Xy  X2  +  X2  x\  Re  X2  —  R  2  i?io  x^  +  R7  X2  X3  Rg  X2  X4  =  0 
2  X2  xl  +  2  Rs  xl  -  8  X5  +  Re  X3  +  R7  X2  X3  =  0 
Rg  X2  X4  +  2  xl- 4  R  X5  =  0 

xi  X2  +  a:i  +  i2io  xl  +  X2  xl  +  Rs  X2  +  R5  +  X4  -  1  +  Re  *3  +  R7  *2  3:3  +  i?9  0:2  X4  =  0 

and  all  x^’s  must  be  positive.  The  results  are  depicted  in  Table  14  for  an  initial  interval  [0, 10*]. 
They  indicate  that  Newton  is  particularly  effective  on  this  problem,  since  it  takes  about  6  seconds 
and  proves  the  existence  of  a  solution  in  the  final  intervals.  Note  that  the  continuation  method  of 
[35]  takes  about  56  seconds  on  this  problem. 
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[-10,10] 

[-10^  10"] 

[-10^  10^] 

[-10^10^] 

CPU  time 

0.91 

11.69 

172.71 

2007.51 

growth 

12.84 

14.77 

11.62 

branching 

52 

663 

9632 

115377 

na-ee 

4290 

57224 

645951 

6541038 

na-te 

810 

6708 

96888 

1173456 

na-tot 

5100 

63932 

742839 

7714494 

fe-ee 

11012 

144843 

1620538 

16647442 

fe-te 

4104 

48804 

769056 

9376632 

fe-tot 

15116 

193647 

2389594 

26024074 

pr-con 

69 

983 

15980 

195270 

proof 

yes 

yes 

yes 

yes 

BSkble  15:  Newton  on  the  Neurophysiology  Problem. 


6.8  A  Neurophysiology  Application 

We  conclude  the  experimental  section  by  showing  an  example  illustrating  the  limitations  of  Newton. 
The  application  is  from  neurophysiology  [35]  and  consists  of  the  following  system  of  equations: 

'  x?+xi  =  i 

a:^  +  =  1 

X5  Xg  —  Cl 

X5  xf  +  X6  X2  =  C2 
X5  xi  X3  +  xe  X4  X2  =  C3 

.  X5  x\  X3  +  Xe  X2  X4  =  C4 

No  initial  intervals  for  the  variables  were  given  and  the  constants  c,-  can  be  chosen  at  random.  The 
continuation  method  of  [35]  solves  this  problem  in  about  6  seconds.  The  results  of  Newton  are 
depicted  in  Table  15  for  various  initial  intervals.  Newton  is  fast  when  the  initial  intervals  are  small 
(i.e.,  [-10,10]).  Unfortunately,  the  running  time  of  the  algorithm  increases  linearly  with  the  size 
of  the  initial  intervals,  showing  a  limitation  of  the  method  on  this  example. 

7  Related  Work  and  Discussion 

The  research  described  in  this  paper  originated  in  an  attempt  to  improve  the  efficiency  of  constraint 
logic  programming  languages  based  on  intervals  such  as  BNR-Prolog  [27]  and  CLP(BNR)  [3].  These 
Constraint  Logic  Programming  languages  use  constraint  solving  as  basic  operation  and  they  were 
based  on  the  simple  generalization  of  arc-consistency  described  previously,  i.e., 

/,•  =  box{Ii  n  {  I  3ri  €  /i, . .  .,3rj_i  G  /,-i, . .  .,3r,+i  G  li+i,-  •  G  In  •  c(ri, . .  .,r„)  }}. 

This  approximation  was  enforced  on  simple  constraints  such  as 


Xi=X2-l-X3,  Xi  =  X2  -  X3,  Xi  =  X2  X  X3 

and  complex  constraints  were  decomposed  in  terms  of  these  simple  constraints. 
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As  mentioned  earlier,  this  approach  is  not  very  eifective  [2]  and  our  main  goal  was  to  design 
new  approximations  of  arc-consistency  that  could  make  use  of  existing  interval  methods.  The  main 
problem  was  the  difficulty  in  characterizing  the  pruning  of  the  Newton  operator  N*  in  a  declarative 
way  (in  order  to  introduce  it  nicely  in  the  above  programming  languages)  and  box-consistency 
emerged  as  an  attempt  to  generalize  the  operator  to  make  sure  that  the  bounds  of  the  interval  were 
locally  consistent.  Subsequent  research  made  us  realize  that  box-consistency  is  independent  of  the 
Newton  operator  and  can  be  enforced  even  if  the  functions  are  not  continuous  or  differentiable.  In 
addition,  the  value  of  applying  box-consistency  on  several  extensions  became  clear.  On  the  one 
hand,  box-consistency  on  the  Taylor  extension  generalizes  interval  methods  based  on  Gauss-Seidel 
iterations  and  enables  us  to  capture  nicely  Hansen-Segupta’s  operator.  On  the  other  hand,  box- 
consistency  on  the  natural  and  distributed  extensions  is  really  orthogonal  to  the  pruning  obtained 
from  the  Taylor  expansion,  producing  a  particularly  effective  algorithm.  It  is  also  worth  pointing 
out  that  Newton  spends  most  time  in  the  natural  and  distributed  extensions.  However,  for  many 
applications,  the  use  of  the  Taylor  interval  extension  is  critical  to  terminate  the  search  quickly 
and  to  avoid  generating  many  small  intervals  around  the  solutions.  As  a  general  observation, 
box-consistency  on  the  natural  and  distributed  extensions  seem  effective  when  far  from  a  solution 
while  box-consistency  on  the  Taylor  expansion  seems  effective  when  near  a  solution.  It  is  worth 
mentioning  that  that  the  interval  community  has  spent  much  effort  to  design  additional  techniques 
to  speed  up  further  the  computation  when  near  a  solution  but  have  not  considered  techniques  to 
improve  pruning  when  far  from  a  solution. 

It  is  interesting  to  note  that  the  idea  of  using  approximations  of  arc-consistency  was  also  used 
independently  by  Hong  and  Stahl  [11],  who  were  also  exposed  to  research  on  Constraint  Logic 
Programming.  Their  use  of  projections  is  however  quite  different  from  ours.  The  key  idea  is  to 
work  with  a  set  of  boxes  and  to  use  projections  to  split  a  box  into  several  subboxes  by  isolating 
all  zeros  of  a  projection.  This  gives  an  algorithm  of  a  very  different  nature  which  cannot  easily  be 
characterized  as  a  branch  &  prune  algorithm  since  constraints  are  used  to  branch.  Our  approach 
seems  to  be  more  effective  in  practice,  since  their  use  of  projections  may  generate  many  subboxes 
that  may  all  need  to  be  pruned  away  later  on,  implying  much  redundant  work.  Our  approach 
postpones  the  branching  until  no  pruning  takes  place  and  generates  only  subboxes  when  they  are 
strictly  necessary  to  progress.  It  is  also  very  interesting  to  report  that,  on  all  benchmarks  that  we 
tested,  the  projection  never  contained  more  than  two  zeros.  This  seems  to  indicate  that  searching 
for  all  zeros  may  not  be  worthwhile  in  most  cases  and  that  box-consistency  may  be  the  right  trade¬ 
off  here.  Finally,  note  that  their  approach  seems  to  use  implicitly  an  distributed  extension^®  but 
they  do  not  make  use  of  the  natural  extension  which  is  very  important  for  some  applications. 

Note  also  that  our  current  implementation  does  not  use  some  of  the  novel  techniques  of  the 
interval  community  such  as  the  more  advanced  conditioners  and  splitting  techniques  of  [13].  It  is  of 
course  possible  to  include  them  easily,  since  the  overall  recursive  structure  of  the  implementations 
is  essentially  similar.  Integrating  these  results  would  obviously  be  of  benefit,  since  these  techniques 
BIS  88  complementary  to  ours. 

The  research  described  here  also  provides  a  uniform  framework  to  integrate  these  techniques 
in  Constraint  Logic  Programming,  to  understand  the  importance  of  the  various  pruning  operators 
and  their  relationships  and  to  suggest  further  research  directions.  For  instance,  higher  notions  of 
consistency  such  as  path-consistency  [19]  may  be  worth  investigating  for  some  applications. 

‘“The  idea  of  sandwitching  the  interval  function  in  between  two  real  functions  is  described  there. 
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8  Conclusion 


In  this  paper,  we  presented  a  branch  &  prune  algorithm  to  find  all  isolated  solutions  to  a  system 
of  polynomial  constraints  over  the  reals.  If  the  solution  are  not  isolated  the  algorithm  will  return 
boxes  that  contain  several  solutions.  The  algorithm  is  based  on  a  single  concept,  box-consistency, 
which  is  an  approximation  of  arc-consistency,  a  notion  well-known  in  artificial  intelligence.  Box- 
consistency  can  be  instantiated  to  produce  Hansen-Segupta  operator  as  well  as  other  narrowing 
operators  which  are  more  effective  when  the  computation  is  far  from  a  solution.  The  algorithm 
and  its  mathematical  foundations  are  simple.  Moreover,  the  algorithm  is  shown  to  behave  well 
on  a  variety  of  benchmarks  from  kinematics,  mechanics,  chemistry,  combustion,  and  economics. 
It  outperforms  the  interval  methods  we  know  of  and  compares  well  with  continuation  methods 
on  their  benchmarks.  In  addition,  problems  such  as  the  Broyden  banded  function  and  the  More- 
Cosnard  discretization  of  a  nonlinear  integral  equation  can  be  solved  for  several  hundred  variables. 
Limitations  of  the  method  (e.g.,  a  sensitivity  to  the  size  of  the  initial  intervals  on  some  problems) 
have  also  been  identified. 
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Abstract 


The  systems  and  concepts  described  in  this  paper  document  the  evolution  of  the  geometric  invariance 
approach  to  object  recognition  over  the  last  five  years.  Invariance  overcomes  one  of  the  fundamental 
difficulties  in  recognising  objects  from  images:  that  the  appearance  of  an  object  depends  on  viewpoint. 
This  problem  is  entirely  avoided  if  the  geometric  description  is  unaffected  by  the  imaging  transformation. 
Such  invariant  descriptions  can  be  measured  from  images  without  any  prior  knowledge  of  the  position, 
orientation  and  calibration  of  the  camera.  These  invariant  measurements  can  be  used  to  index  a  library 
of  object  models  for  recognition  and  provide  a  principled  basis  for  the  other  stages  of  the  recognition 
process  such  as  feature  grouping  and  hypothesis  verification.  Object  models  can  be  acquired  directly 
from  images,  allowing  efficient  construction  of  model  libraries  without  manual  intervention. 

A  significant  part  of  the  paper  is  a  summary  of  recent  results  on  the  construction  of  invariants  for  3D 
objects  from  a  single  perspective  view.  A  proposed  recognition  architecture  is  described  which  enables  the 
integration  of  multiple  general  object  classes  and  provides  a  means  for  enforcing  global  scene  consistency. 

Various  criticisms  of  the  invariant  approach  are  articulated  and  addressed. 


1  Introduction 

The  computer  recognition  of  objects  has  attracted  considerable  research  effort  over  the  last  25  years.  It 
is  now  widely  accepted  that  object  recognition,  in  the  setting  of  real  world  scenes  and  based  on  a  single 
perspective  view,  is  a  difficult  problem  and  cannot  be  achieved  without  the  use  of  object  models  to  guide 
the  processing  of  image  data  and  to  confirm  object  hypotheses.  It  is  also  accepted  that  the  most  reliable 
information  which  is  available  in  a  scene  is  derived  from  a  geometric  description  of  the  object  based 
on  its  projection  in  the  form  of  2D  geometric  image  features,  as  opposed  to,  for  example,  its  intensity 
shading.  Thus,  object  recognition  systems  draw  on  a  library  of  geometric  models,  which  usually  contain 
information  about  the  shape  and  appearance  of  a  set  of  known  objects,  to  determine  which,  if  any,  of 
those  objects  appear  in  a  given  image  or  image  sequence.  Recognition  is  considered  successful  if  the 
geometric  configuration  in  an  image  can  be  explained  as  a  perspective  projection  of  a  geometric  model 
of  the  object. 

At  present,  3D  recognition  systems  generally  have  small  modelbases  containing  relatively  simple  ob¬ 
jects.  Progress  is  needed  on  three  fronts: 

•  Larger  modelbases:  Systems  should  be  able  to  deal  with  modelbases  containing  hundreds  to 
thousands  of  models.  The  methods  of  pose  consistency  (reviewed  in  section  1.1),  which  are  com¬ 
monly  used  for  modelbases  with  only  a  few  objects,  are  infeasible  for  large  modelbases  because 
of  the  computational  expense.  Coping  with  such  sizes  clearly  requires  some  partitioning  of  the 
modelbase. 

•  More  general  shape  models:  Typically  polyhedra  are  used,  which  are  a  poor  model  for  curved 
objects.  A  direct  representation  for  non-trivial  curved  objects  is  required. 

•  Automatic  segmentation  and  grouping:  This  is  the  process,  also  called  figure-ground  sepa¬ 
ration,  of  extracting  image  feature  groups  which  correspond  to  individual  object  outlines  without 
including  the  background  and  other  occluding  objects.  The  lack  of  such  grouping  is  a  significant 
barrier  to  successful  recognition  in  current  systems.  In  addition  to  representing  the  shape  of  3D 
objects,  models  will  have  to  provide  mechanisms  for  their  feature  segmentation  and  grouping. 

This  paper  establishes  a  framework  for  the  next  generation  of  3D  model  based  vision  recognition 
systems  which  will  have  large  modelbases,  with  objects  partitioned  into  a  number  of  different  3D  object 
classes.  Recognition  is  from  single  perspective  images  of  scenes,  where  the  camera  is  uncalibrated,  the 
objects  could  be  partially  occluded,  and  the  scene  might  contain  objects  not  in  the  model  library.  The 
object  classes  are  defined  geometrically  in  terms  of  symmetry  or  other  3D  geometric  constraints.  The 
constraint  enables  invariants  of  a  3D  object  in  the  class  to  be  extracted  from  a  single  image  of  the  object 
outline;  and  also  generates  invariant  relations  on  the  image  outline  that  enable  grouping. 
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Although  the  paper  concentrates  on  perspective  images,  the  methods  are,  of  course,  applicable  in  weak- 
perspective  (or  “affine”)  imaging  situations.  Weak-perspective,  a  linear  approximation  to  perspective, 
is  appropriate  as  a  camera  model  when  object  relief  is  small  compared  to  distance  from  the  camera.  A 
consequence  is  that  parallel  world  lines  are  imaged  as  parallel  lines.  Invariants  computed  for  perspective 
imaging  are  also  valid  for  weak-perspective. 

A  major  constraint  underlying  the  work  presented  here  is  that  recognition  is  based  on  one  uncalibrated 
view  of  a  scene.  Our  motivation  is  that  this  restriction  applies  in  many  of  the  current  and  future 
applications  for  object  recognition,  such  as  aerial  surveillance,  image  database  query  processing,  and 
image-hypertext  editing.  Even  if  more  images  are  available,  for  example  in  the  case  of  video  processing, 
camera  calibration  will  not  generally  be  known  initially.  Any  grouping,  recognition  hypothesis,  or  object 
recovered  up  to  some  ambiguity  from  a  single  image,  can  be  propagated  to  advantage  to  subsequent 
views. 

A  central  question  explored  in  this  paper  is  the  nature  of  the  shape  representation  necessary  for 
recognition.  Euclidean  (metric)  representations  are  routinely  used  in  many  existing  recognition  systems. 
However,  under  the  most  general  imaging  conditions,  structure  is  recovered  up  to  a  projective  transforma¬ 
tion  (i.e.,  a  more  general  transformation  than  Euclidean).  We  demonstrate  that  projective  representations 
are  adequate  for  recognition.  A  stratification  of  representations  is  provided  by  the  hierarchy  of  trans¬ 
formation  groups:  projective,  affine,  similarity  (scaled  Euclidean),  and  Euclidean.  This  representation 
hierarchy  is  progressively  more  restrictive;  for  example,  two  objects  that  are  projectively  equivalent  need 
not  be  affine  or  similarity  equivalent.  We  will  be  primarily  concerned  with  the  projective  stratum,  since 
this  covers  the  “worst  case”  ambiguity.  The  other  strata  will  be  used  to  advantage  at  particular  stages 
of  the  recognition  process. 

A  related  area  is  the  use  of  quasi-invarianis  [4].  A  quasi-invariant  is  an  object  property  or  relation 
that  is  not  invariant  to  projective  transformations,  but  is  stable  over  a  useful  range  of  views.  Invariants 
of  other  transformation  groups  in  the  hierarchy  given  above  are  sometimes  quasi-invariants  [5].  Quasi¬ 
invariants  can  be  very  effective  in  grouping  and  partial  indexing  even  though  they  vary  under  perspective 
projection.  Examples  of  quasi-invariants  are  given  in  the  paper. 

Our  geometric  notion  of  class  differs  from  the  more  usual  functional  one.  For  example,  in  our  defini¬ 
tions,  a  vase  is  considered  as  a  surface  of  revolution  as  opposed  to  a  container  for  flowers  and  water.  A 
geometric  class  is  not  specific  to  a  particular  object  but  instead  describes  a  family  of  objects  which  are 
unified  by  their  common  3D  constraint  relations.  A  number  of  examples  of  these  3D  object  classes  are 
given  in  section  3. 

We  have  defined  a  recognition  architecture  which  integrates  these  ideas.  Class  influences  each  level  of 
the  architecture,  from  image  grouping  through  to  organisation  of  the  model  base  and  3D  scene  constraints. 
Recognition  is  class  based,  proceeding  first  by  a  classification  based  on  image  curves,  and  subsequently  the 
identification  of  a  particular  model  within  the  class  using  values  of  geometric  attributes.  This  contrasts 
with  many  existing  recognition  systems  where  a  particular  object  is  directly  identified.  The  architecture, 
combined  with  the  success  of  existing  implementations,  demonstrates  that  a  large-scale  system  imple¬ 
mentation  based  on  an  invariant  framework  is  now  warranted.  This  effort  will  culminate  in  an  object 
recognition  system  that  can  recognise  a  broad  class  of  3D  structures  with  thousands  of  individual  object 
instances  in  the  model  library. 

1.1  Related  approaches  to  object  recognition 

Recognition  is  the  establishment  of  a  correspondence  between  image  and  model  features.  Most  recent  ap¬ 
proaches  to  recognition  have  been  implemented  in  three  stages  (similar  to  those  defined  in  [28]):  grouping, 
indexing,  and  verification. 

The  aim  of  grouping  (also  called  perceptual  organisation  [37],  selection^  or  figure- ground  discrimina¬ 
tion)  is  to  provide  an  association  of  features  that  are  likely  to  have  come  from  a  single  object  in  a  scene. 
Features  are  typically  grouped  together  using  cues  such  sls  proximity,  parallelism  [3,  37]  collinearity,  and 
approximate  continuity  in  curvature  [12,  62].  The  indexing  stage  hypothesises  an  association  between 
the  grouped  image  features,  and  features  on  a  model  in  the  library.  The  final  stage,  verification,  deter¬ 
mines  the  consistency  of  this  hypothesis  with  the  image  data.  The  image-model  match  is  used  to  project 
the  model  onto  the  image,  and  to  test  the  validity  of  the  model  hypothesis  and  mo  del- to- image  feature 
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correspondences  determined  by  measuring  image  support. 

There  are  three  distinct  categories  of  algorithm  that  have  been  used  to  compute  correspondence: 

1.  Interpretation  trees  frame  the  model-to-image  correspondence  task  as  a  search  tree  to  allow  all 
possible  model  and  image  feature  associations,  and  then  control  and  prune  this  search  process. 
Although  inefficient,  this  has  proved  reliable  for  planar  object  recognition  for  a  small  modelbase 
when  single  images  are  used  [27],  and  has  been  extended  by  Ettinger  to  include  useful  notions 
about  how  hierarchical  object  descriptions  can  be  realised  [14].  However,  interpretation  trees  are 
not  generally  able  to  work  with  single  images  of  three  dimensional  objects  (though  effective  when 
3D  data  is  provided  as  direct  input  to  the  system  [2,  27,  48,  49,  52]).  Interpretation  trees  are  not 
restricted  to  rigid  objects;  the  sup-inf  framework  for  geometric  reasoning  used  in  ACRONYM  [8] 
allows  the  interpretation  tree  to  account  for  tolerance  interval  constraints  on  parameterised  objects. 
Brooks’  work  has  been  extended  by  both  Fisher  [19]  and  Reid  [54]  for  different  types  of  sensor  and 
constraint  framework.  Other  ways  to  treat  parameterisations  have  been  suggested  by  Crimson  [28]. 

2.  Hypothesise  and  test,  also  called  alignment,  first  aligns  a  model-to-image  features  [31]  to  yield  an 
initial  estimate  of  pose.  This  hypothesised  alignment  is  tested  by  searching  for  other  model-to-image 
correspondence  predicted  by  the  model  pose  (verification).  This  algorithm  has  been  implemented 
for  a  variety  of  data  formats  and  feature  types  [1,  6,  15,  24,  38,  69].  In  fact,  extensions  to  3D  curved 
surfaces  have  even  been  created  [13,  33]. 

3.  Pose  clustering  is  implemented  by  computing  the  object  pose  from  a  group  of  features  corre¬ 
sponding  to  a  particular  model,  and  storing  the  estimate  in  an  accumulator  in  pose  space;  if  enough 
local  groups  have  the  same  pose,  a  hypothesis  for  the  model  is  formed.  This  approach  (frequently 
called  generalised  Hough)  has  the  disadvantage  that  the  pose  space  is  high  dimensional  (six  do/ for 
3D  Euclidean  space),  so  searching  for  consistent  pose  is  expensive.  Two  ways  round  this  are  to 
use  a  decomposition  of  the  pose  space  into  separable  parameters  [44,  68],  or  to  use  an  adaptive 
Hough  transform  [66].  Another  approach  eliminates  the  requirement  to  quantise  the  pose  space 
into  rectangular  cells  by  constructing  a  quantisation  that  depends  both  on  the  estimates  of  pose, 
and  on  the  expected  error  bounds  of  the  pose  merisurements  [10]. 

For  a  small  number  of  models,  for  example  two  or  three,  it  is  reasonable  simply  to  try  to  find  image 
feature  support  for  each  model.  This  approach  is  typical  of  many  existing  systems  [1,  2,  28,  31,  38,  48,  52]. 
As  the  size  of  the  model  library  increases,  this  approach  becomes  computationally  too  expensive.  It  is 
then  more  effective  to  choose  potential  models  from  the  library  based  on  the  observed  image  features. 
That  is,  image  feature  measurements  are  used  to  index  into  the  model  base.  In  constructing  such  index 
functions,  invariance  plays  a  major  role,  since  a  model  should  be  identified  irrespective  of  object  pose. 

1.2  Geometric  invariants  in  modelling  and  recognition 

Invariants  are  properties  of  geometric  configurations  which  remain  unchanged  under  an  appropriate  class 
of  transformations.  Within  the  context  of  vision  we  are  interested  in  determining  the  invariants  of 
an  object  under  perspective  projection  onto  an  image.  For  example,  for  a  planar  object  the  perspective 
projection  between  object  and  image  planes  is  a  projective  transformation.  Properties  such  as  intersection, 
collinearity,  and  tangency  are  unaffected  by  a  projective  transformation;  however,  invariant  values  can 
also  be  computed.  Examples  are  given  in  section  2.1. 

More  formally,  under  a  linear  transformation  of  coordinates,  X'  =  TX,  the  invariant,  /(P),  of  a 
configuration  P  transforms  as 

i(p')  =  iTr/(P) 

and  is  called  a  relative  invariant  of  weight  w,  where  P'  is  the  transformed  configuration.  If  u;  =  0,  the 
invariant  is  unchanged  under  transformations  and  is  called  a  scalar  invariant.  We  will  only  be  interested 
in  scalar  invariants  in  this  paper. 

In  general  we  seek  invariance  to  projective  transformations,  so  T  is  a  general  non-singular  square 
matrix  acting  on  homogeneous  coordinates.  For  planar  configurations  it  is  3  x  3,  and  for  3D  configurations 
4x4.  Note  that  invariants  are  computed  with  respect  to  a  transformation,  which  is  a  mapping  between 
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spaces  of  the  same  dimension.  The  goal  is  to  meeisure  the  invariants  from  a  perspective  projection  of  the 
configuration,  where  the  image  may  have  a  lower  dimension  than  the  object.  We  write  P  for  the  projection 
matrix  that  covers  a  3D  Euclidean  transformation  of  the  object  followed  by  perspective  projection  onto 
the  image.  For  planar  objects  the  original  and  image  spaces  are  the  same  dimension  and  P  is  simply  a 
projective  transformation  represented  by  a  3  x  3  matrix.  This  is  discussed  in  detail  in  section  2.  For 
three-dimensional  objects,  the  original  and  image  spaces  are  no  longer  of  the  same  dimension  and  P  is  a 
3x4  matrix  mapping  3D  homogeneous  coordinates  onto  the  image  plane.  This  is  described  in  detail  in 
section  3.2. 

1.2.1  Indexing 

One  of  the  most  important  uses  of  invariants  in  vision  is  as  indexing  functions.  In  traditional  modebbased 
recognition  systems  (section  1,1),  recognition  proceeds  by  hypothesising  a  correspondence  between  image 
and  object  features,  and  then  evaluating  the  hypothesis  based  on  the  consistency  of  the  best  projection 
of  the  model  onto  the  image  features.  This  constitutes  simultaneously  finding  pose  and  performing 
recognition,  and  is  generally  of  a  complexity  linear  in  the  number  of  models  in  the  library,  since  each 
model  must  be  evaluated. 

An  index  function  provides  direct  access  to  a  certain  model  in  the  model  base  without  using  specific 
information  about  the  model,  or  model  pose  in  advance.  Ideally,  the  index  function  should  uniquely 
retrieve  a  model  from  the  library  (thus  facilitating  constant  time,  as  opposed  to  linear,  access  to  the 
library),  but  in  practice  it  is  likely  that  a  small  number  of  models  are  retrieved  with  the  same  index. 
Even  so,  the  search  cost  is  considerably  reduced  below  that  of  testing  the  full  library.  The  index  is 
typically  a  vector  of  independent  invariant  measurements. 

More  formally:  the  index  is  considered  to  be  a  vector,  M,  which  selects  a  particular  model  from  the 
library.  The  index  is  a  function  M(f)  of  a  set  of  projected  object  features  only,  where  f  =  PF,  with  F 
object  features,  and  f  the  corresponding  image  features.  Assuming  that  M  can  be  computed  from  any 
image  projection  of  the  object  features,  then  library  values  for  M  can  be  constructed  simply  by  acquiring 
one  or  a  few  images  of  the  object  in  isolation. 

For  planar  objects,  P  is  a  planar  projective  transformation,  T,  from  the  object  in  an  arbitrary  pose 
onto  the  image  plane,  and 


M(T(F))  =  M(F) 

i.e.,  the  index  has  the  same  value  computed  on  the  original  object  and  after  the  transformation  (a  scalar 
invariant).  Each  element  of  the  index  vector  M  is  an  invariant  measure  computed  from  a  group  of  image 
features  such  as  conics,  lines,  points  and  plane  curve  segments.  A  typical  example  is  shown  in  figure  1. 
For  3D  objects  the  same  function  cannot  be  applied  to  object  and  image,  since  they  differ  in  dimension. 
However,  again  M  is  defined  so  that  each  element  is  a  projective  invariant  of  the  3D  structure  that  is 
measured  from  the  perspective  image.  Examples  are  given  in  section  3.2. 

1.2.2  Invariance  and  representation 

The  term  “invariance”  does  not  simply  refer  to  the  viewpoint-invariant  measurement  vector  described 
above.  The  term  also  includes  the  idea  of  an  invariant  relation,  which  is  distinct  from  an  invariant  value. 
For  example,  the  cross-ratio  is  an  invariant  value  of  four  collinear  points.  The  collinearity  of  the  points  is 
a  projectively  invariant  relation  between  the  points  which  is  independent  of  the  cross-ratio  value.  In  the 
definition  of  generic  geometric  classes,  the  identification  of  invariant  relations  is  often  a  more  important 
issue  for  representation  than  the  computation  of  specific  invariant  indexing  values. 

Another  general  aspect  of  the  invariant  approach  is  the  symbiotic  application  of  geometric  and  alge¬ 
braic  analysis.  It  is  often  the  case  that  geometric  insights  provide  the  first  clue  to  the  nature  of  invariants 
for  a  particular  object  class.  Then  subsequent  algebraic  analysis  can  generalise  and  simplify  invariant 
computation,  and  in  turn  provide  additional  insight. 
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1.2.3  Model  acquisition  from  images 

A  model  consists  of  the  set  of  significant  geometric  features  of  the  object  boundary  known  up  to  a 
projective,  or  more  restrictive,  transformation  (for  example  affine).  Projective  models  can  be  constructed 
from  images  without  requiring  knowledge  of  the  intrinsic  camera  parameters  or  known  3D  ground  control 
points.  In  the  case  of  2D  objects  the  model  can  be  acquired  from  a  single  image,  for  3D  objects  more 
images  are  generally  required.  Model  acquisition  is  discussed  further  in  sections  2.3  and  4.3. 


Before  proceeding  to  the  case  of  more  general  3D  object  recognition,  we  review  a  mature  system  for  2D 
object  recognition.  This  review  will  illustrate  many  of  the  issues  in  object  recognition  by  invariants  and 
provide  a  context  for  our  more  general  discussion  of  recognition  architectures  at  the  end  of  the  paper. 

Notation 

We  adopt  the  notation  that  corresponding  entities  in  two  different  coordinate  frames  are  distinguished 
by  upper  and  lower  case.  In  general  lower  case  is  used  for  image  quantities,  and  upper  for  3D  quantities. 
Vectors  are  written  in  bold  font,  e.g.,  x  and  X.  Matrices  are  written  in  typewriter  font,  e.g.,  c  and  C. 
With  homogeneous  quantities,  equality  is  up  to  a  non-zero  scale  factor. 

For  smooth  surfaces  the  profile  (also  called  the  apparent  contour)  is  the  outline  of  the  surface  in  the 
image.  It  is  the  image  projection  of  a  surface  curve,  the  contour  generator^  where  rays  from  the  optical 
centre  are  contained  in  the  surface  tangent  plane. 


2  The  planar  recognition  system 

The  use  of  planar  projective  invariants  for  planar  object  recognition  is  particularly  appropriate  and 
straightforward  because  a  projective  transformation  between  object  and  image  planes  covers  all  the 
major  imaging  transformations:  the  plane  to  plane  projectivity  models  the  composed  effects  of  3D  rigid 
rotation  and  translation  of  the  world  plane  (camera  extrinsic  parameters),  perspective  projection  to 
the  image  plane,  and  an  affine  transformation  of  the  final  image  which  covers  the  effects  of  camera 
intrinsic  parameters.  Consequently,  projective  invariants,  which  are  unaffected  with  respect  to  all  of 
these  parameters,  have  a  high  currency  for  this  domain  [40,  50,  55,  57,  58,  60,  70,  71]. 

Here  we  summarise  the  main  features  of  a  planar  object  recognition  system  that  has  been  developed 
during  the  past  four  years.  The  projective  representation  of  shape  used  in  the  system  has  the  key 
advantages  of  simple  model  acquisition  (direct  from  images),  no  need  for  camera  calibration  or  object 
pose  computation,  and  the  use  of  index  functions.  Recognition  proceeds  by  measuring  invariants  in 
the  target  image.  The  invariants  are  used  to  construct  index  vectors  to  select  models  from  the  library. 
If  the  index  value  coincides  with  that  associated  with  a  model,  a  recognition  hypothesis  is  generated. 
Recognition  hypotheses  corresponding  to  the  same  object  are  merged  to  form  joint  hypotheses^  provided 
they  are  geometrically  compatible.  The  (joint)  hypotheses  are  then  verified.  The  system^  has  been  tested 
on  a  large  set  of  images  and  under  varying  levels  of  occlusion  and  clutter.  A  detailed  description  of  this 
system  appears  in  [60]. 

The  projective  nature  of  the  representation  is  utilised  at  a  number  of  stages  in  the  recognition  process, 
for  example  in  both  model  acquisition  and  verification.  In  acquisition,  any  image  provides  a  projective 
model  of  the  object  outline  because  the  image  and  object  planes  are  related  by  a  projective  transfor¬ 
mation.  This  is  because  the  object  is  mapped  by  a  perspective  transformation  onto  the  image,  and 
perspective  is  a  restricted  form  of  a  projective  transformation.  In  verification,  the  target  image  outline 
is  projectively  related  to  the  model  image  outline.  This  follows  because  the  target  image  outline  is  a 
projective  transformation  of  the  object  outline,  which  is  a  projective  transformation  of  the  model  image 
outline.  Plane  projective  transformations  are  a  group,  and  a  sequence  of  projective  transformations  is 
equivalent  to  a  single  projective  transformation  (group  closure). 

^The  system  is  called  LEWIS.  The  motivation  for  this  n^mle  is  explained  in  section  4.4. 
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Figure  1:  The  lines  used  to  compute  the  five  line  planar  projective  invariant  for  the  above  images  are  highlighted 
in  white.  The  values  are  given  in  table  1. 


Five  line  invariants 

Measured  on 

h 

Object 

0.840 

1.236 

Figure  1(a) 

0.842 

1.234 

Figure  1(b) 

0.840 

1.232 

Figure  1(c) 

0.843 

1.234 

Table  1:  Values  of  plane  projective  invariants  measured  on  the  object,  and  from  images  with  varying  perspective 
effects.  The  values  vary  (due  to  measurement  noise)  by  less  than  0.4%. 


2.1  Projective  invariants  used 

There  are  three  different  algebraic  invariant  constructions  used  in  the  system:  five  lines;  a  conic  and  two 
lines;  and  a  conic  pair.  For  example  the  two  invariants  of  five  lines  are  given  by 


r  —  1^431 1 1^521 1  1  r  _  |1^42i||H532| 

^  |H42l||lf53l|  IN432IIS521I 


(1) 


where  Hijjt  =  |»«;jb|  is  the  determinant,  and  1  =  (/i,/2,/3)  is  the  homogeneous  representation  of 

a  line:  kx  +  hy  +  /a  =  0.  (See  [45]  for  the  other  invariants.)  Table  1  gives  examples  of  these  invariants 
computed  from  the  images  shown  in  figure  1,  which  have  varying  degrees  of  perspective  distortion. 
These  are  applicable  to  image  curves  that  are  “algebraic”  (lines,  conics).  For  non-convex  smooth  curve 
segments  canonical  frame  invariants  [45,  57]  are  used.  These  are  constructed  from  projective  coordinates 
of  a  concavity  delineated  by  a  bitangent. 

In  all  cases  there  is  tolerance  to  partial  occlusion,  i.e.,  the  invariants  can  still  be  formed  if  part  of 
the  outline  is  occluded.  This  is  a  result  of  using  semt-local  invariant  descriptions,  i.e.,  not  global,  like 
moments  of  the  entire  boundary,  and  reduTidancy:  there  are  a  number  of  different  descriptors  for  each 
object  so  that  there  is  not  an  excessive  requirement  for  any  single  object  region  to  be  visible.  In  the 
algebraic  case  lines  and  conics  can  still  be  extracted  if  part  of  the  curve  is  occluded. 


2.2  Architecture 

The  stages  of  recognition  are  shown  in  figure  2.  In  the  following  sections  we  describe  these  stages  in 
sufficient  detail  to  expose  the  important  issues  for  consideration  in  extending  these  ideas  to  3D  object 
recognition. 
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Figure  2:  The  recognition  system  has  a  single  grey  scale  image  as  input  and  the  outputs  are  verified  hypotheses 
with  associated  confidence  values.  Many  of  the  processes  are  shared  by  the  acquisition  and  the  recognition 
paths.  The  recognition  system  is  similar  to  previous  systems  [28]  in  all  but  the  indexing  and  hypothesis  merging 
stages. 


2.2.1  Feature  extraction  and  invariant  formation 

The  goal  of  the  segmentation  is  the  extraction  of  geometric  primitives  suitable  for  constructing  invariants. 
In  the  algebraic  case  this  involves  straight  lines  and  conics,  and  for  non-algebraic  curves,  concavities 
delineated  by  bitangents.  An  example  of  algebraic  segmentation  is  shown  in  figure  4. 

A  local  implementation  of  Canny’s  edge  detector  is  used  to  find  edgels  to  sub~pixel  accuracy.  These 
edgels  are  linked  into  chains,  extrapolating  over  any  small  gaps.  Considerable  advantage  is  made  of  local 
image  feature  topology.  In  many  recognition  systems,  the  local  connectivity  of  edgel  chains  and  fitted 
features  is  ignored;  but  we  have  found  that  feature  grouping,  based  on  the  connectivity  provided  by  edgel 
chains  and  proximity,  allows  index  formation  to  have  a  low  complexity  with  respect  to  the  number  of 
image  features. 

For  algebraic  invariants,  connectivity  enables  efficient  linking  and  ordering  of  line  segments.  For 
example,  five  line  invariants  are  formed  from  sets  of  consecutive  lines  within  single  edgel  chains  at  a 
cost  that  is  linear  in  the  number  of  lines  in  the  scene  (i.e.,  0{l),  compared  to  0{fi)  if  all  groupings  are 
attempted).  For  concavities,  the  curve  again  provides  an  ordering  for  the  feature  points  used  (bitangent 
and  cast  tangent  points  [73])  and  only  the  two  cases  of  global  curve  reversal  have  to  be  considered. 

Once  sets  of  grouped  features,  f,  have  been  produced,  the  algebraic  and  canonical  invariants  are 
computed.  Each  set  of  grouped  features,  or  concavity  curve,  generally  produces  a  number  of  invariant 
values  which  are  collected  into  a  vector  M(f ).  The  invariant  vector  formed  by  the  above  process  represents 
a  point  in  the  multidimensional  invariant  space.  The  space  is  quantised  to  enable  hashing.  Each  object 
feature  group  is  represented  by  a  collection  of  points  that  define  a  region  in  the  invariant  space,  the  size 
of  which  depends  upon  the  me2isured  variance  in  the  invariant  vaiue^. 

^See  section  2.3. 
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2.2.2  Indexing  to  generate  recognition  hypotheses 

The  invariant  values  computed  from  the  target  image  are  used  to  index  against  invariant  values  in  the 
library.  If  the  value  is  in  the  library  a  preliminary  recognition  hypothesis  is  generated  for  the  corresponding 
object.  Each  type  of  invariant  (e.g.,  five  lines,  conic  pair)  separately  generate  hypotheses. 

This  process  is  made  more  efficient  using  a  hash  table  that  allows  simultaneous  indexing  on  all  elements 
of  the  measurement  vector.  In  the  experiments  to  date  there  has  not  been  any  significant  problem  with 
collisions  in  the  hash  table.  Hash  table  collisions^  should  not  be  confused  with  the  intersection  of  object 
invariant  measurements  in  index  space.  These  intersections  lead  to  erroneous  hypotheses  which  cost  some 
effort  during  the  verification  stage,  but  are  usually  eliminated. 

2.2.3  Hypothesis  merging 

Many  collections  of  primitives  may  come  from  the  same  model  instance:  for  example,  an  object  consisting 
of  a  square  plate  with  a  circular  hole  in  it  admits  four  collections,  each  consisting  of  a  conic  and  two 
connected  lines.  Each  collection  has  an  invariant  which  may  generate  a  recognition  hypothesis.  Such  a  set 
of  recognition  hypotheses  is  compatible  if  a  single  model  instance  could  explain  all  of  them  simultaneously. 
Prior  to  verification,  compatible  hypotheses  are  combined  into  joint  hypotheses.  There  are  number  of 
reasons  why  hypothesis  merging  is  desirable: 

1.  Backprojection  and  searching  for  image  support  is  computationally  expensive  and  it  is  more  efficient 
to  validate  several  hypotheses  of  the  same  object  together. 

2.  More  features  facilitates  more  accurate  least  squares  calculation  of  the  back  projection  transfor¬ 
mation  (there  are  more  matched  model  and  image  features),  and  consequently  a  reduced  error  in 
measuring  image  support. 

3.  Many  hypotheses  indexing  the  same  object  in  a  single  part  of  the  scene  significantly  increase  con¬ 
fidence  that  the  match  is  correct. 

The  hypothesis  merging  process  is  equivalent  to  forming  an  interpretation  tree  for  the  indexed  object 
based  on  the  features  which  index  a  particular  model.  The  merging  is  controlled  by  topological  and  geo¬ 
metric  compatibility.  The  topological  consistency  (ordering  and  connectedness)  is  illustrated  in  figure  3a. 
Geometric  consistency  is  implemented  efficiently  by  a  second  use  of  invariants  —  this  time  joint  invariants 
between  the  feature  groups  used  to  compute  each  individual  hypothesis.  This  is  illustrated  in  figure  3b. 

2.2.4  Verification 

There  are  two  steps  involved  in  verification,  both  of  which  can  reject  a  (joint)  recognition  hypothesis. 
The  first  is  to  attempt  to  compute  a  common  projective  transformation  between  the  model  features  and 
the  putative  corresponding  features  in  the  target  image.  The  second  is  to  use  this  transformation  to 
project  the  entire  model  onto  the  target  image,  and  then  measure  image  support. 

Incorrect  hypotheses  arise  because  grouped  image  features  happen  to  have  an  invariant  value  that 
coincides  (within  the  error  bounds)  with  one  in  the  library.  The  features  used  to  produce  the  matching 
model  and  image  invariants  provide  sufficient  constraints  to  compute  the  projective  transformation  be¬ 
tween  the  model  and  image.  In  general  this  will  be  over  constrained  —  many  more  constraints  than  the 
eight  unknowns  of  the  projective  transformation  are  available.  Consequently,  if  a  common  transformation 
cannot  be  computed  the  features  are  not  projectively  equivalent  and  the  hypothesis  is  rejected. 

Backprojection  and  subsequent  searching  involves  the  entire  model  boundary,  not  just  the  features 
used  to  form  the  invariant.  Projected  model  edgels  must  lies  close  to  image  edgels  with  similar  orientation 
(within  5  pixels  and  15*^).  If  more  than  a  certain  proportion  of  the  projected  model  data  is  supported 
(the  threshold  used  is  50%),  there  is  sufficient  support  for  the  model,  and  the  recognition  hypothesis  is 
confirmed.  The  final  part  of  the  process  is  expensive  as  O(IO^)  edgels  need  to  be  mapped  onto  the  image. 
Efficiency  is  improved  by  approximating  the  distances  using  the  3-4  distance  transform  of  Borgefors  [7]. 

^  A  hash  table  collision  occurs  when  a  number  of  models  have  the  same  hash  index.  Such  a  collision  can  occur  when  the 
number  of  hash  buckets  is  smaller  than  the  model  population  or  when  the  hashing  function  is  not  uniform  and  causes  many 
models  to  hash  to  the  same  bucket. 
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Figure  3:  Hypothesis  compatibility:  (a)  If  the  same  model  is  indexed  by  a  five-line  invariant  (due  to  lines  h, 
i  £  {1, . .  *,5}),  and  a  conic  three-line  invariant  that  is  compatible  with  it  (due  to  C  and  Ij,  i  €  {2, . .  .,4}), 
then  it  is  wise  to  verify  both  hypotheses  together.  The  invariants  are  compatible  if  the  ordering  of  the  image 
lines  are  consistent  with  those  on  the  model,  (b)  For  a  pair  of  concavity  curves  there  are  8  distinguished 
points  which  could  be  used  to  form  2  x  8  -  8  =  8  different  five  point  invariants.  Rather  than  computing  so 
many,  which  is  unnecessary,  invariants  are  computed  between  the  four  distinguished  points  of  each  concavity, 
and  the  ‘central’  point  of  the  other.  This  yields  four  invariants,  and  does  so  using  a  symmetric  construction. 
These  invariants  are  sufficient  to  hypothesise  compatibility. 

2.3  Model  acquisition  and  library  formation 

One  benefit  of  using  only  projective  representations,  rather  than  Euclidean  ones,  is  that  a  model  can 
be  acquired  directly  from  an  image.  No  special  orientations  or  calibrations  are  required.  Acquisition  is 
simple  and  semi-automatic  (for  instance,  curves  do  not  have  to  be  matched  entirely  by  hand  between 
images),  using  the  same  software  for  segmentation  and  invariant  computation  as  used  during  recognition. 

A  model  consists  of  the  following:  a  name;  a  set  of  edges  from  an  acquisition  view  of  the  object 
(used  in  the  backprojection  stage  of  verification);  the  lines,  conics  and  concavities  fitted  to  the  edges; 
the  expected  invariant  values  and  to  which  algebraic  features  and  curve  portions  they  correspond.  (The 
mean  and  variance  of  the  invariant  values  are  computed  from  a  variety  of  ‘standard’  viewpoints  of  the 
object.);  and,  finally,  topological  connectivity  and  geometric  relations  between  feature  groups  used  in  the 
construction  of  joint  invariants. 

The  library  is  partitioned  into  different  sub- libraries,  one  for  each  type  of  invariant  (e.g.,  one  for  the 
five-line  invariant,  another  for  the  conic  pair).  Each  sub-library  then  has  a  list  of  each  of  the  invariant 
values  tagged  with  an  object  name,  and  is  structured  as  a  hash  table. 

2.4  Recognition  examples 

Only  a  small  number  of  examples  are  included  since  others  appear  elsewhere  [45,  58,  60].  In  each  case 
successful  recognition  is  demonstrated  by  projecting  the  model  outline  onto  the  image.  Segmentation  for 
algebraic  features  is  shown  in  figure  4.  The  two  objects  in  the  scene  which  are  contained  in  the  library 
are  successfully  recognised  using  algebraic  invariants  computed  from  these  features  despite  substantial 
occlusion  and  clutter.  1049  invariants  are  computed  which  index  41  hypotheses.  These  are  converted  into 
131  joint  hypotheses'*  that  have  to  be  verified,  of  which  13  are  rejected  by  first  stage  verification,  based 
on  valid  projective  transformations,  and  78  require  the  second  stage,  based  on  image  support.  Figure  5 
shows  recognition  based  on  canonical  frame  invariants.  The  algebraic  and  canonical  frame  invariants  can 
be  independently  applied  to  an  image  to  recognise  objects  of  both  types.  Figure  6  shows  an  example  of 

^The  joint  hypothesis  list  consists  of  combinations  of  compatible  hypotheses,  together  with  all  the  original  hypotheses. 
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Figure  4:  (a)  A  scene  containing  two  objects  from  the  model  base,  with  fitted  lines  (100  of  them)  and  conics 
(27)  superimposed  in  (b).  These  numbers  are  typical  for  images  of  this  type.  Note  that  many  lines  are  caused 
by  texture,  and  that  some  of  the  conics  correspond  to  edge  data  over  only  a  small  section.  The  lines  form 
70  different  line  groups,  (c)  shows  the  two  objects  correctly  recognised,  the  lock  striker  plate  matched  with  a 
single  invariant  and  50.9%  edge  match,  and  the  spanner  with  three  invariants  and  70.7%  edge  match. 


Figure  5:  Single  concavities  are  sufficient  to  recognise  the  two  model  instances  shown  in  (b).  The  redundancy 
of  the  canonical  frame  representation  gives  much  better  tolerance  to  occlusion  than  global  shape  methods. 
The  left  hand  object  gained  67.1%  boundary  support,  and  the  right  object  81.6%. 


recognition  for  both  index  naethods  together. 

2.5  Summary  of  performance 

Figure  7  shows  data  collected  over  fifty  evaluations  of  the  recognition  system  in  which  a  single  object 
from  the  model  base  was  placed  in  a  scene  and  partially  occluded  by  other  objects  that  are  not  in  the 
model  base  (clutter).  The  average  number  of  hypotheses  computed  as  more  models  were  added  to  the 
library  is  plotted.  The  first  model  added  to  the  library  always  corresponded  to  the  actual  object  in  the 
scene.  With  33  models  in  the  library,  on  average  15.8%  of  the  hypotheses  were  for  the  correct  model. 
Although  predominately  linear,  the  graph  has  a  very  low  gradient. 

The  real  benefit  of  indexing  becomes  apparent  when  one  considers  how  many  hypotheses  would  be 
produced  if  an  alignment  technique  is  used,  maintaining  the  same  grouping  methods.  On  average,  over 
2000  feature  groups  are  produced  for  each  image,  and  so  2000  hypotheses  would  be  generated  for  each 
model  feature  group  in  the  library  (generally  there  are  four  or  five  feature  groups  per  object  and  so  the 
situation  would  be  far  worse).  This  would  result  in  about  7  x  10*^  hypotheses  for  the  entire  model  base 
compared  to  fewer  than  60  produced  when  indexing  is  used.  As  these  all  have  to  be  verified  it  is  clear 
that  indexing  produces  a  dramatic  improvement  in  the  system  efficiency. 


Figure  6;  A  demonstration  that  both  types  of  invariant  index  can  be  used  to  recognise  objects  in  a  single 
image.  The  bracket  is  indexed  using  algebraic  invariants  and  the  spanner  is  indexed  using  the  canonical  frame 
signature. 


Figure  7:  The  number  of  hypotheses  that  have  to  be  verified  as  the  number  of  models  in  the  library  is  varied. 
The  results  show  an  average  over  fifty  scenes  containing  only  one  object  in  the  library,  but  with  other  clutter 
and  occlusion  present.  Over  2000  indexes  are  created  for  the  scene,  which  corresponds  to  the  number  of 
hypotheses  that  would  have  to  be  verified  per  model  feature  group  if  alignment  is  used.  Therefore,  there 
is  a  rapid  linear  growth  in  the  number  of  hypotheses  created  as  the  model  base  is  expanded.  However,  the 
number  of  hypotheses  created  through  indexing  remains  substantially  lower  —  there  is  a  linear  growth,  but 
with  a  very  low  constant  of  proportionality. 
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2.6  Appraisal 

This  system  is  an  effective  and  reliable  recognition  system,  and  demonstrates  a  number  of  features  that 
are  likely  to  be  important  in  building  the  next-generation  system: 

•  Hypothesis  combination:  simply  verifying  each  indexed  model  is  prohibitive,  particularly  for 
complex  objects  with  many  features.  Hypothesis  combination  is  an  effective  way  of  combining 
semi-local  information  from  different  parts  of  the  scene  to  obtain  a  single  recognition  hypothesis. 

•  Untrustworthy  and  expensive  verification:  verification  is  neither  cheap  nor  reliable,  as  it 
involves  back-projecting  a  large  number  of  features,  and  testing  for  distance  between  those  features 
and  possibly  unrelated  image  events.  Verification  scores  can  be  incorrectly  high,  due  to  background 
clutter  and  texture  which  leads  to  false  positives.  The  next  generation  system  must  have  more 
extensive  verification  mechanisms  using  region  properties  as  well  as  edge  geometry.  Also  much 
more  careful  analysis  of  edge  and  junction  intensity  events  must  be  carried  out  with  respect  to 
constraints  imposed  by  the  model.  For  example,  specialised  corner  detection  can  be  supervised  by 
the  model  hypothesis. 

•  A  need  for  global  scene  analysis:  in  many  cases,  ambiguities  arise  which  must  be  settled 
globally  by  a  scene  analysis  approach:  for  example,  does  a  given  image  line  come  from  object  A 
or  object  B?  Are  the  recognition  hypotheses  consistent?  The  lack  of  local  support  for  a  model 
hypothesis  can  be  augmented  by  global  relationships,  e.g.,  A  is  on  top  of  and  partially  occluding  B. 
In  this  case  we  can  predict  the  features  which  are  potentially  available  to  support  hypotheses  for 
B,  once  A  is  recognised. 

Next,  we  take  up  the  problem  of  3D  object  recognition.  First,  the  central  question  of  the  existence  of 
invariants  for  the  perspective  projection  of  general  3D  structures  is  discussed. 

3  Extending  invariant  descriptions  to  3D  structures 

Much  recent  debate  has  focused  around  a  theorem,  proven  by  a  number  of  authors  [9,  11,  42],  which 
states  that  invariants  can  not  be  measured  for  a  3D  set  of  points  in  general  position  from  a  single 
view.  The  theorem  has  frequently  been  misinterpreted  to  mean  that  no  invariants  can  be  formed  for 
three  dimensional  objects  from  a  single  image.  For  the  theorem  to  hold,  however,  the  points  must  be 
completely  unconstrained  (like  a  cloud  of  gnats).  If  a  3D  structure  is  constrained,  then  invariants  are 
available.  For  example,  six  points  constrained  to  lie  on  two  planes  in  a  “butterfly”  configuration,  as  in 
figure  8,  have  a  cross-ratio  that  can  be  measured  in  the  image.  This  is  a  projective  invariant  of  the  entire 
3D  structure,  and  not  simply  a  disguised  planar  invariant,  since  each  plane  contains  only  four  points  (five 
coplanar  points  are  required  to  form  a  plane  projective  invariant  from  points  alone). 

In  fact,  a  set  of  points  in  general  position  in  space  is  a  poor  model  of  what  we  see:  the  world  is  full 
of  curves,  polyhedra,  and  surfaces;  sets  of  isolated  points  are  an  irregular  occurrence.  An  analogue  of 
the  above  “no-invariants”  theorem,  in  the  case  of  surfaces,  would  be  to  ask  whether  a  generic  surface 
has  invariants  measurable  in  a  single  image  from  its  profile.  Other  than  qualitative  descriptions  such  as 
non-convexity  (from  the  sign  of  the  profile  curvature  [32])  and  the  Euler  characteristic  (from  the  profile 
of  a  transparent  surface)  no  projective  invariant  can  be  obtained.  A  similar  result  holds  for  space  curves. 
However,  if  the  surface  satisfies  constraints,  much  can  be  recovered  from  a  single  image,  as  the  following 
section  demonstrates. 

3.1  Object  classes 

The  form  of  the  constraint  on  the  object  defines  an  object  class.  The  class  determines  both  the  process  by 
which  the  3D  invariants  are  measured  in  images,  and  the  particular  segmentation  and  grouping  strategies 
that  are  applied  during  “early  vision”.  For  example,  surfaces  of  revolution  define  a  class,  with  a  specific 
vase  or  wine  glass  being  particular  instances  of  the  class.  Projective  invariants  of  the  3D  surface  can 
be  recovered  from  the  image  profile,  and  further  the  two  matching  “sides”  of  the  profile  are  projectively 
equivalent  (section  3.5).  That  is,  one  side  can  be  mapped  onto  the  other  by  a  projective  transformation. 
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Figure  8:  A  "butterfly"  configuration  of  six  3D  points  with  a  projective  invariant  measurable  from  a  single 
perspective  image.  Points  ABCD  and  CDEF  lie  on  two  planes  intersecting  in  the  line  CD.  The  lines  AB 
and  EF  intersect  the  line  CD  generating  four  collinear  points.  This  construction  can  be  carried  out  in  3D 
and  the  image  to  generate  corresponding  points.  The  cross-ratio  of  these  points  is  the  projective  invariant. 
Note  that  the  planes  can  articulate  about  the  line  CD  without  altering  the  value  of  the  cross-ratio.  Many 
analogous  structures  exist  e.g.,  if  the  points  AB  are  replaced  by  a  line. 

The  segmentation  and  grouping  for  this  class  is  guided  by  the  association  of  these  projectively  related 

image  contours.  •  » 

It  is  important  to  distinguish  this  notion  of  geometric  class  from  the  idea  of  a  generic  type.  For 
example,  the  class  of  rotationally  symmetric  objects  is  not  the  same  as  the  generic  type  category  of  wine 
glass.  There  can  be  many  different  shapes  of  wine  glass  but  the  class  of  rotationally  symmetric  objects  is 
still  larger  and  does  not  capture  the  functional  notion  of  a  wine  drinking  container.  A  related  discussion 
of  class  is  given  by  Moses  and  Ullman  [42]  who  contrast  the  notions  of  generic  and  specific  classes  with 
regard  to  recognition  functions. 

Another  significant  aspect  of  the  class  definition  is  its  imposition  of  constraints  which  can  be  measured 
and  verified  in  the  image.  This  consideration  is  a  significant  departure  from  the  hypothesise  and  test 
paradigm  of  conventional  model-based  recognition  systems  operating  on  specific  objects.  Here,  the  class 
assumption  can  be  immediately  confirmed  without  committing  to  the  full  chain  of  recognition  processing. 
For  example,  for  a  rotationally  symmetric  object,  the  two  “sides”  of  the  image  outline  are  related  by  a 
planar  projective  transformation  (section  3.5).  This  relation  can  be  immediately  tested  when  a  pair  of 
image  profile  curves  are  hypothesised  as  belonging  to  an  object  of  class  rotationally  symmetric. 

In  the  following  sections  we  catalogue  a  number  of  object  classes  where  each  is  defined  by  an  associated 
constraint.  In  each  case  the  recovery  of  invariants  is  illustrated  and  other  geometric  consequences,  such 
as  invariant  relations,  described. 

3.2  Definitions  —  3D  projective  invariants 

In  what  follows,  we  assume  a  perspective  camera  with  unknown  internal  parameters,  and  measure  only 
projective  properties  in  the  image.  In  turn,  this  means  in  general  that  only  projective  properties  of  the 
3D  objects  can  be  recovered.  Algebraically,  the  camera  is  modelled  as  x  =  PX: 

r  -I  r  T  r  ^  ' 

X  Pn  P12  P13  Ph  y 

k  y  =  P21  P'22  P23  P24  ^  (2) 

_  1  ^  [  P31  P32  P33  P34  J 

where  (x,y)^  are  image  coordinates,  and  (A,  T,  world  coordinates,  and  fc  is  a  scaling,  in  this  case, 

^  =  (P31^  +  P32V'  +  P33^  +  P34)~^-  ,  •  •  u 

We  now  introduce  3D  projective  invariants  because  they  are  the  basis  for  image  invariants  that  we 

can  hope  to  recover  from  a  single  view  of  a  constrained  structure.  These  are  invariants  under  projective 
transformations  ofV^.  A  projective  transformation  of  can  be  written  as: 

X'  ti\  t\2  tl3  ^14 

,  y'  _  <21  h2  <23  <24 

Z'  "  <31  <32  <33  <34 

1  _  <41  <42  <43  <44 
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where  k  is  again  the  appropriate  scaling  to  ensure  the  fourth  coordinate  is  one.  Fifteen  parameters 
are  required  to  define  the  3D  projective  transformation  matrix  up  to  an  arbitrary  scale  factor.  Thus 
five  3D  points  are  sufficient  to  construct  a  projective  coordinate  system.  A  sixth  point  will  then  have 
invariant  3D  coordinates  in  the  projective  basis  defined  by  the  other  five.  These  3D  point  invariants  can 
also  be  interpreted  as  the  cross-ratio  of  tetrahedral  volumes  computed  by  taking  determinants  of  point 
coordinates,  four  at  a  time. 

For  example,  an  invariant  for  six  3D  points  is  given  by 


/6pt,(Xl,X2,X3,X4,X5,X6) 


|Xi  X2  X3  X4I  |Xi  X2  X5  Xel 
|Xi  X2  X3  X5I  |Xi  X2  X4  Xel 


where  X^  =  (X,* ,  Yi,  Z,-,  l)^  This  invariant  has  the  familiar  property  of  invariants  that 


V,(X; ,  X'2,  X^,  X;,  X's,  X'e)  =  /6pt,(Xi ,  X2,  X3,  X4,  Xs,  Xe), 


i.e.,  both  the  value  and  the  form  of  the  expression  are  unaffected  by  the  transformation. 

By  assuming  that  a  set  of  constraints  hold  among  the  3D  projective  invariants  of  a  point  set,  it 
becomes  possible  to  measure  3D  projective  invariants  in  a  single  view.  The  following  section  illustrates 
the  nature  of  these  constraints  and  provides  a  geometric  interpretation  for  the  measurable  invariants. 

3.3  Constrained  Point  Sets 

It  is  possible  in  general  to  predict  whether  invariants  of  a  three  dimensional  structure  can  be  measured 
from  images,  by  counting  the  number  of  image  measurements  available.  While  such  counting  arguments 
cannot  cover  every  degeneracy,  and  therefore  never  offer  a  proof  that  an  invariant  is  or  is  not  possible,  they 
offer  a  useful  guide  to  what  is  likely  to  be  true.  A  complication  in  counting  the  degrees  of  freedom  of  a 
geometric  configuration,  and  the  number  of  parameters  of  a  transformation,  is  the  existence  of  isotropies. 
An  isotropy  is  an  action  of  a  transformation  which  does  not  alter  the  geometry  of  a  configuration.  For 
example,  translation  along  the  tangent  direction  of  a  line  or  rotation  about  the  centre  of  a  circle  does  not 
affect  the  structure.  Therefore  an  isotropy  reduces  the  effective  number  of  transform  parameters,  and 
generally  increases  the  number  of  invariants. 

Consider  a  3D  configuration  M.  Plane  projective  invariants  are  denoted  /2,  and  projective  invariants 
of  3D  denoted  I3.  Then, 

For  m  perspective  images  of  M,  if  there  is  no  isotropy  group  acting  in  then  to  recover 
n/3  functionally  independent  invariants  of  the  three-dimensional  structure  M  from  image 
information  alone,  the  following  inequality  must  be  satisfied: 

m  X  nj^  >  nj^  H-  3m 

where  nj^  is  the  number  of  functionally  independent  plane  projective  invariants  of  the  image 
of  M.  If  there  is  an  isotropy  group  of  dimension  (dim/s)  acting,  then  (provided  dim/5  <  3) 
the  following  inequality  must  be  satisfied: 

m  X  n/3  >  n/3  -f  m(3  -  dim/5) 

We  sketch  the  reasoning  when  there  is  no  isotropy  group  acting  for  the  case  of  a  single  image.  The 
image  projective  invariants  are  functions  only  of  the  projective  invariants  of  the  configuration  consisting  of 
M  taken  together  viiih  the  optical  centre,  O.  To  see  this,  consider  a  projective  transformation  of  This 
projectively  distorts  the  {M,  O}  configuration  and  the  image  plane.  However,  the  image  plane  geometry 
is  transformed  by  only  a  plane  projective  transformation.  This  means  that  the  projective  invariants  of 
both  the  image  configuration  and  the  3D  configuration  are  unaffected.  The  image  projective  invariants 
can  depend  only  on  the  rays  linking  O  to  points  of  M,  and  depend,  therefore,  on  the  optical  centre  O  as 
well  as  on  M .  Since  the  image  projective  invariants  are  unaffected  by  the  position  of  the  image  plane, 
the  relationship  between  the  2D  projective  image  invariants  and  the  3D  projective  object  invariants  is  a 
function  only  of  the  three  unknown  coordinates  of  the  centre  of  projection.  Provided  there  are  three  or 
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A 

B 

c 

DOF 

16 

14 

12 

dim  Is 

0 

2 

4 

1 

1 

1 

4 

2 

1 

dof  of  O  that  matter 

3 

1 

0 

Counting  Relation 

4=1  +  3 

2=1  +  1 

1  =  1  +  0 

Table  2:  Examples  of  the  counting  argument  for  various  butterfly  like  structures.  A  =  6  point  butterfly;  B  = 
butterfly  with  two  points  of  wing  replaced  by  line  (4  points  one  line);  C  =  butterfly  with  lines  on  both  wings 
(2  points  2  lines). 


more  such  image  invariants,  it  is  possible  (in  principle)  to  eliminate  the  (unknown)  contribution  of  the 
optical  centre. 

The  counting  argument  then  simply  relates  the  number  of  unknowns  and  the  number  of  measurements: 
in  m  views  there  are  3m  unknowns  for  the  optical  centres,  and  nj^  unknown  3D  projective  invariants 
for  the  configuration  M;  the  number  of  measurements  is  n/^  in  each  of  the  m  images.  Note  that,  like 
most  such  arguments,  the  condition  is  necessary  but  may  not  be  sufficient;  this  means  that  there  could 
be  cases  where  the  counting  argument  indicates  that  invariants  can  be  measured  from  the  image,  when 
in  fact  they  cannot.  The  significance  of  the  argument  is  that  it  indicates  where  a  further  analysis  may 
be  useful. 

As  an  example,  consider  the  case  of  six  points  in  space  which  have  three  3D  projective  invariants,  as 
discussed  above.  If  we  specify,  or  assume,  the  values  for  two  of  the  invariants,  then  we  can  compute  the 
value  of  the  third  from  a  single  image.  The  so-called  butterfly  configuration  in  figure  8  is  an  example 
where  we  assume  that  two  of  the  3D  invariants  are  zero,  which  corresponds  to  the  coplanarity  of  two 
sets  of  four  points  in  the  six  point  configuration.  The  counting  argument  goes  as  follows:  the  number 
of  degrees  of  freedom  for  the  image  points  is  12  (2  for  each  point)  less  8  for  the  plane  projective  group 
gives  n/2  =  4.  For  6  points  in  space  on  two  planes  there  are  16  degrees  of  freedom  (3  for  each  point,  less 
two  for  the  planarity  constraints)  less  15  for  the  3D  projective  group,  gives  nj^  =  1.  There  are  also  three 
unknown  coordinates  of  the  centre  of  projection.  Thus  the  counting  argument  shows  that  the  unknown 
3D  invariant  can  be  measured  in  a  single  view,  i.e., 

1  X  4  >  1  +  1  X  3, 

Similar  counts  for  a  number  of  butterfly  analogues  in  the  case  of  isotropies  are  given  in  table  2.  Sparr  [63] 
has  constructed  many  other  examples  of  butterfly-like  configurations,  and  provides  a  method  for  gener¬ 
ating  such  invariants  algebraically. 

The  counting  argument  is  used  in  this  manner  to  focus  attention  on  configurations  where  invariants 
may  be  available.  As  a  further  example,  consider  recognising  algebraic  surfaces  from  their  profiles.  In  this 
case,  the  surface  has  degree  d,  and  has  [(l/6)(d+3)(d+2)(d-l-l)~l]-15  functionally  independent  projective 
invariants.  The  profile  has  degree  d{d  —  1),  and  has  [(1/2)(1  —  d  -f-  d^)(2  —  d  +  d?)  —  1]  —  8  functionally 
independent  projective  invariants.  For  d  >  2,  the  number  of  invariants  of  the  profile  substantially 
exceeds  the  number  of  invariants  of  the  surface,  and  so  it  is  reasonable  to  expect  to  recover  invariants 
from  the  profile  of  an  algebraic  surface.  In  fact,  such  invariants  can  be  recovered,  though  the  procedure 
is  complicated;  details  are  given  in  [22]. 

3,4  Repeated  Structure 

Structures  that  repeat  in  a  single  image  of  a  scene  are  equivalent  to  multiple  views  of  a  single  instance  of 
the  structure.  Thus,  for  example,  a  view  of  two  similar  cars  in  a  car  park  where  the  cars  are  parked  within 
translations  of  one  another,  is  equivalent  to  a  stereo  pair  of  images  of  one  such  car,  with  the  cameras 
related  by  a  pure  translation.  The  3D  shape  of  the  car  can  be  recovered  by  the  familiar  techniques  of 
stereopsis.  More  formally, 
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A  repeated  structure  is  defined  by  a  geometric  structure  5,  and  a  3D  transformation  T ,  which 
generates  a  transformed  copy  of  5,  i,e.,  5'  =  T{S).  Both  S  and  S'  are  viewed  in  the  same 
perspective  image. 

In  many  cases  the  internal  calibration  parameters  of  the  camera  will  be  unknown.  In  this  case  a  single 
image  of  a  repeated  structure  is  mathematically  identical  to  an  uncalibrated  stereo  pair  where  the  two 
cameras  are  related  by  the  transformation  between  S  and  S'.  It  has  been  shown  by  Faugeras  [16] 
and  Hartley  et  ai  [29]  that  if  one  carries  out  stereo  reconstruction  from  two  uncalibrated  perspective 
images,  the  reconstruction  can  differ  from  the  actual  3D  Euclidean  geometry  of  the  object  by  a  3D 
projective  transformation.  Thus,  3D  projective  invariants  of  this  recovered  structure  have  the  same  value 
as  projective  invariants  measured  on  the  actual  Euclidean  structure. 

The  equivalence  with  stereo  means  that  epipolar  structure  can  be  defined  within  a  single  image  and 
represents  the  geometric  relationship  between  corresponding  features  on  the  object  copies.  As  a  simple 
example,  consider  the  case  where  T  is  a  3D  translation,  i.e.,  S  and  S'  are  related  by  a  simple  3D 
translation.  In  this  case,  it  can  be  shown  [41]  that  affine^  rather  than  projective,  3D  structure  can  be 
recovered.  Lines  joining  corresponding  points  on  S  and  S'  are  parallel  in  3D  and  are  imaged  as  a  set 
of  lines  converging  to  a  vanishing  point.  These  imaged  correspondence  lines  and  vanishing  point  are  the 
analogue  of  “epipolar  lines”  and  “epipole”,  and  these  terms  will  be  used  from  now  on.  For  translation 
only,  there  is  a  single  epipole  and  corresponding  points  in  S  and  S'  lie  on  the  same  epipolar  line.  We 
call  this  convenient  correspondence  relation  auio-epipolar  correspondence.  This  correspondence  relation 
is  an  example  of  the  more  general  idea  that  repeated  3D  geometric  structure  imposes  2D  constraints  on 
corresponding  image  features  which  can  be  used  to  advantage  in  grouping  and  verification. 

We  reserve  the  epipolar  terminology  for  the  case  where  the  centres  of  projection  of  the  two  cameras 
are  displaced.  For  some  repeated  structures,  the  transformation  between  S  and  S'  does  not  alter  the 
camera  centre  and  thus  does  not  yield  an  epipolar  structure.  However,  it  is  still  possible  to  construct  a 
correspondence  structure  in  the  image  which  is  not  an  epipolar  geometry  but  has  many  similar  advantages. 
An  example  of  this  is  given  in  section  3.5  for  surfaces  of  revolution. 

Thus  we  have  two  recurring  issues  that  arise  in  the  context  of  repeated  structures  and  in  most  cases 
where  object  class  produces  invariants  that  can  be  measured  in  a  single  image  view: 

1.  The  computation  of  image- measureable  3D  invariants, 

2.  The  correspondence  relationship  between  the  imaged  features  of  the  3D  structure,  and  associated 
grouping  strategies. 

In  the  next  few  sections  we  review  some  mature  examples  of  repeated  structure  [36,  47]  where  the 
discussion  is  organised  around  these  two  issues. 

3.4.1  Bilateral  Symmetry 

In  the  case  of  a  single  bilateral  symmetry,  the  repeated  structure  is  the  half  object  on  one  side  of  the 
symmetry  plane.  A  single  camera  imaging  a  bilaterally  symmetric  object  is  equivalent  to  two  identical 
cameras,  viewing  the  half  structure,  where  one  camera  is  transformed  to  the  other  by  a  reflection  in  the 
object  symmetry  plane.  A  similar  observation  wcis  made  by  [26,  39],  though  in  the  context  of  a  calibrated 
camera.  Below  we  give  examples  of  3D  point  sets  and  space  curves  with  a  single  bilateral  symmetry. 

3D  geometry  Lines  joining  corresponding  points  (on  either  side  of  the  symmetry  plane)  are  parallel 
and  orthogonal  to  the  plane  of  symmetry.  There  is  a  natural  coordinate  system  provided  by  these 
correspondence  directions  and  the  symmetry  plane  (figure  9),  The  correspondence  lines  intersect  the 
symmetry  plane  at  the  midpoint  of  the  corresponding  points. 

Pleasuring  Invariants  Perspective  projection  does  not  preserve  mid-points.  However,  the  images 

of  the  3D  midpoints  can  be  computed  (see  below).  All  3D  mid- points  are  co-planar  (they  lie  on  the 
symmetry  plane).  There  is  a  projective  transformation  between  the  set  of  imaged  mid-points  and  the  3D 
points  on  the  plane  of  symmetry.  Thus  planar  projective  invariants  can  be  measured  in  the  image  from 
the  computed  midpoints. 
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Figure  9:  The  natural  coordinate  frame  for  an  object  with  bilateral  symmetry.  The  XY  plane  is  the  plane  of 
reflection,  and  the  Z  axis  is  parallel  to  lines  joining  corresponding  points.  Note,  all  mid-points  are  coplanar. 


The  image  of  the  3D  midpoints  can  be  computed  using  a  property  of  equally  spaced  points  (see 
[64]):  three  collinear  points,  separated  by  the  same  distance,  and  taken  with  a  point  at  infinity  have  a 
harmonic  cross-ratio.  Since  the  point  at  infinity  on  the  line  joining  two  corresponding  points  is  imaged 
as  a  vanishing  point  (see  below,  and  figure  10),  it  can  be  observed.  Thus,  the  position  of  the  midpoint  in 
the  image  can  be  computed  from  the  image  coordinates  of  the  corresponding  points  and  from  the  image 
coordinates  of  the  vanishing  point.  Furthermore,  since  computing  a  point  that  has  a  fixed  cross-ratio 
with  respect  to  three  other  points  is  linear,  there  is  a  unique  solution.  Other  geometric  methods  for 
computing  the  imaged  mid-point  are  available,  based  on  point  pairs  or  triplets. 

The  3D  structure  can  be  reconstructed  up  to  a  projective  ambiguity,  based  on  the  equivalence  with 
uncalibrated  stereo  [16,  29].  3D  projective  invariants  can  be  measured  from  this  recovered  structure.  In 
the  case  of  bilateral  symmetry,  structure  is  recovered  to  better  than  a  projective  ambiguity  because  of 
the  orthogonality  constraints  available  in  the  “natural”  coordinate  frame  [18,  60]. 

Correspondence  and  Grouping  The  epipolar  structure  in  this  case  arises  from  the  parallel  lines 
joining  corresponding  points  (on  each  side  of  the  object).  Under  perspective  projection,  these  correspon¬ 
dence  lines  (the  epipolars)  image  to  a  family  of  lines  converging  to  a  single  vanishing  point  (the  epipole). 
The  epipole  can  be  determined  using  two  pairs  of  corresponding  points  (figure  10).  Once  the  epipole  has 
been  computed,  further  correspondences  are  found  by  a  ID  search  on  the  epipolar  line.  This  is  a  recurring 
theme  —  the  constraints  that  define  the  object  class  not  only  show  how  invariants  may  be  recovered,  but 
also  facilitate  and  direct  the  image  grouping. 

We  demonstrate  two  examples  of  bilateral  symmetry,  a  polyhedral  point  set  and  a  space  curve. 

1.  3D  polyhedron:  figure  11  shows  different  views  of  the  3D  reconstruction  of  a  stapler  obtained 
from  a  single  view.  The  reconstruction  is  placed  in  a  Euclidean  frame  to  give  a  normal  presentation 
of  the  object  shape,  but  any  projective  frame  could  be  used. 

2.  A  space  curve:  corresponding  points  on  the  two  imaged  space  curves  are  determined  using  the 
epipolar  geometry.  Four  different  views  of  the  reconstruction  for  the  outline  of  a  spoon  are  shown 
in  figure  12  (the  3D  projective  representation  has  again  been  constrained  to  lie  in  a  believable 
Euclidean  frame). 
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Figure  10:  (a)  The  epipole  can  be  located  using  the  intersection  of  lines  between  two  corresponding  points 
on  a  bilaterally  symmetric  object  (the  points  are  marked  by  solid  circles).  Epipolars  can  then  be  constructed 
through  the  epipole  to  aid  correspondence,  (b)  Typical  corresponding  points  determined  in  this  manner. 


Figure  11:  Three  dimensional  structure  is  recovered,  modulo  a  projectivity,  from  the  single  view  of  the  points 
marked  on  the  stapler  in  figure  10b.  Four  typical  views  are  shown  with  only  (a)  at  a  viewpoint  close  to  that 
of  the  original  image.  Note  the  collinearity  of  the  line  segments  in  (b),  this  demonstrates  the  accuracy  of  the 
recovered  structure. 
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Figure  12;  A  single  view  of  an  object  with  a  plane  of  bilateral  symmetry,  such  as  a  teaspoon,  is  sufficient  to 
allow  a  full  3D  projective  reconstruction.  Only  two  pairs  of  distinguished  points  are  needed  for  the  approach, 
these  are  recovered  from  surface  markings  and  can  be  used  to  determine  the  epipolar  structure  of  the  image. 
Using  this  epipolar  structure  an  arbitrary  number  of  correspondences  can  be  produced.  Four  different  views 
are  shown  of  the  3D  reconstruction  computed  from  the  image.  The  construction  works  very  well:  note  the 
planarity  of  the  handle  recovered  in  (b),  and  the  full  3D  shape  in  all  of  the  images. 


The  reasoning  outlined  above  can  be  applied  to  objects  with  more  than  one  bilateral  symmetry  [18] 
and  to  objects  projectively  equivalent  to  ones  with  bilateral  symmetry. 

3.4.2  Translational  Repetition 

In  this  case  the  structures,  S  and  S',  are  related  by  a  3D  translation.  As  described  above,  the  structure 
of  5  can  be  recovered  up  to  an  afBne  ambiguity,  from  a  single  image  of  the  duplicate  structure. 

Measuring  Invariants  AfRne  invariants  are  computed  from  the  perspective  image  in  three  stages. 
First,  structure  is  recovered  up  to  a  projective  ambiguity  using  uncalibrated  stereo.  Second,  the  plane 
at  infinity  [64]  is  determined  in  this  projective  coordinate  frame  as  follows:  a  line  on  5  is  parallel  to  its 
counterpart  on  S',  so  the  intersection  of  corresponding  lines  is  a  point  P  on  the  plane  at  infinity.  The 
image,  p,  of  P  is  computed  by  intersecting  the  imaged  corresponding  lines.  Since  p  lies  on  both  lines  its 
3D  position,  P,  can  be  determined  by  stereo.  Three  such  points  determine  the  plane  at  infinity.  Third, 
the  structure  is  projectively  transformed  such  that  the  plane  at  infinity  has  the  standard  form  X4  —  0. 
The  structure  is  then  known  up  to  an  affine  ambiguity,  and  aflSne  invariants  measured  from  this  structure 
have  the  same  value  as  invariants  measured  on  the  3D  Euclidean  structure  S. 

Correspondence  and  Grouping  As  in  the  bilateral  symmetry  case,  lines  joining  corresponding  3D 
points  are  parallel.  The  image  correspondence  is  again  auto-epipolar  (all  corresponding  lines  intersect  in 
a  single  epipole). 

As  an  example,  invariants  are  calculated  from  the  images  of  two  translated  polyhedral  structures 
shown  in  figure  13.  The  translation  between  the  duplicated  structure  differs  in  each  case  demonstrating 
that  the  invariants  are  associated  with  the  structure  itself,  i.e.,  S.  Affine  invariants  are  computed  for  the 
3D  vertex  positions  which  are  computed  using  the  epipolar  geometry  of  the  translated  copies.  The  values 
of  the  invariants  are  given  in  table  3.  The  differences  between  the  invariants  computed  from  each  image 
are  small,  even  though  the  translation  vector  between  the  speakers  and  the  viewpoint  varies  significantly. 
In  many  cases  of  such  repeated  structures,  the  object  copies  are  rigidly  connected,  but  this  example 
illustrates  that  the  affine  invariants  are  independent  of  the  translation  vector  as  well. 


19 


Figure  13:  One  object  (a  speaker)  repeated  under  translation.  The  epipolar  correspondence  lines  for  image 
(a)  are  shown  in  (b).  The  translation  vector  is  different  for  images  (a)  and  (c)  and  the  same  between  (c)  and 
(d).  Affine  invariants  computed  from  these  images  are  compared  in  table  3. 


image  a 

image  c 

image  d 

-0.2249 

-0.0642 

1.2833 

-0.2324 

-0.0685 

1.2979 

-0.2317 

-0.0626 

1.2849 

Table  3:  Comparison  of  3D  affine  invariants  computed  for  the  speaker  from  figure  13.  The  invariant  is  the 
3D  position  of  one  corner  of  the  speaker  in  an  affine  frame  defined  by  four  other  points  on  the  speaker.  The 
values  are  fairly  stable,  even  though  the  images  have  different  translation  vectors  and  viewpoints. 


3.4.3  Other  Repeated  Structures 

The  notion  of  structure  repetition  under  a  transformation  is  extensible  to  more  general  situations.  It  is 
not  necessary  that  the  copies  be  Euclidean  equivalent  and  repeated  under  translation.  The  copy  trans¬ 
formation  can  be  a  full  3D  projective  transformation  whilst  still  preserving  an  epipolar  correspondence 
within  the  image.  The  3D  reconstruction  of  the  object  geometry  is  then  known  only  up  to  a  3D  projective 
transformation  of  space. 

In  the  case  that  there  are  three  or  more  Euclidean  equivalent  structures,  the  geometry  can  be  recovered 
up  to  a  3D  similarity.  This  follows  from  the  equivalence  of  this  case  to  three  views  of  a  single  object 
taken  with  an  identical  camera,  where  it  has  been  demonstrated  that  structure  can  be  recovered  up  to  a 
3D  similarity  [17]. 

It  is  also  interesting  to  speculate  about  approximately-repeated  structures.  Suppose  that  the  structure 
is  not  repeated  according  to  a  rigid  3D  transformation  but  is  only  an  approximation  to  such  a  transfor¬ 
mation.  This  approximate  repetition  occurs  in  natural  objects  such  as  animals  and  vegetation.  It  seems 
that  the  invariants  which  one  can  compute  from  an  idealised  form  of  the  approximate  repetition  will  not 
be  very  far  from  an  invariant  description  of  the  actual  structure.  For  example  in  a  bunch  of  grapes,  it  can 
be  assumed  that  each  grape  is  copy  of  the  other  under  an  affine  transformation  or  perhaps  even  a  scaled 
Euclidean  transformation.  Another  example  is  texture  which  can  be  thought  of  as  a  statistical  repeated 
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Figure  14:  The  profile  of  a  surface  of  revolution  is  projectively  equivalent  to  two  curves  with  bilateral  symme- 
try.  Under  a  projective  transformation  parallel  correspondences  (left)  converge  to  a  vanishing  point  (right). 
Corresponding  points  x  ^  x'  are  related  in  this  case  by  a  particular  plane  projective  transformation,  T,  called 
a  planar  harmonic  homology.  The  transformation  has  a  line  of  fixed  points,  the  image  of  the  axis  of  symmetry, 
which  result  from  two  of  the  eigenvalues  of  T  being  equal.  There  is  also  a  fixed  point,  ei,  not  on  the  line, 
called  the  centre  of  the  homology  which  defines  correspondences  between  symmetrical  points  on  each  side  of 
the  contour.  That  is,  corresponding  point  pairs  and  the  centre  of  the  homology  are  collinear.  The  cross-ratio 
of  ei,  the  corresponding  points  x,x',  and  the  intersection  of  their  join  with  the  axis,  is  harmonic.  (The  line 
of  fixed  points  is  02  x  03,  where  03  and  03  are  the  eigenvectors  with  equal  eigenvalues.  The  third  eigenvector, 
01,  is  distinct  and  non-zero,  and  is  the  centre  for  a  pencil  of  fixed  lines.) 


structure. 

3.5  Rotational  Symmetry 

Surfaces  of  revolution  have  had  considerable  attention,  though  generally  with  calibrated  cameras  [13],  or 
as  a  special  case  of  a  generalised  cylinder  [53,  72]. 

Profil0  G0om0try  The  image  curves  forming  the  two  “sides”  of  the  profile  are  related  by  a  plane 
projective  transformation,  T,  with  the  property  that  =  I.  Such  a  projective  transformation  is  called 
a  planar  harmonic  homology  [64].  It  arises  in  this  case  because  the  image  transformation  is  a  conjugate 
reflection  (whose  conjugating  element  is  a  projective  transformation).  To  see  this,  construct  the  plane 
containing  the  axis  of  the  surface  and  the  optical  centre.  The  surface  then  has  a  mirror  symmetry  in  this 
plane,  as  does  the  cone  of  rays  through  the  optical  centre  and  tangent  to  the  surface.  This  cone  yields 
the  profile  when  it  is  intersected  with  the  image  plane.  Clearly,  the  contour  generators  are,  in  general, 
space  curves,  related  by  a  mirror  symmetry  in  space.  If  the  image  plane  is  perpendicular  to  the  plane  of 
symmetry,  then  the  profile  has  a  mirror  symmetry;  but  the  profile  for  any  other  image  plane  is  within  a 
projective  transformation  of  the  perpendicular  plane. 

In  this  case,  there  is  no  epipolar  geometry  defined,  since  reflection  in  the  symmetry  plane  does  not 
move  the  optical  centre.  However,  a  correspondence  relation  still  exists  and  is  generated  by  the  planar 
homology  between  the  opposing  sides.  A  planar  harmonic  homology  (see  figure  14)  is  a  special  case  of  a 
planar  homology  (see  figure  20b)  for  which  the  characteristic  invariant  of  the  homology  is  harmonic  [64]. 
For  planar  homologies  there  is  a  fixed  point  which  is  the  centre  of  a  pencil  of  fixed  lines  which  define 
correspondence  pairs.  That  is,  corresponding  points  lie  on  the  same  line  of  the  pencil,  in  the  same  manner 
as  the  epipolar  geometry  of  translated  cameras. 

Moasuring  Invariants  The  intersections  of  “corresponding”  profile  bitangents  lie  on  the  projection 
of  the  object’s  axis.  The  image  intersection  points  are  projections  of  the  intersection  points  between 
planes  bitangent  to  the  surface  and  the  3D  object  axis.  This  point  is  viewpoint  independent.  This  is 
shown  schematically  in  figure  15.  Four  such  points  are  sufficient  to  measure  a  cross-ratio  (the  points  are 
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bitangent 

planes 


Figure  15:  A  rotationally  symmetric  object,  and  the  planes  bitangent  to  the  object  and  passing  through  the 
optical  centre,  are  shown.  It  is  clear  from  the  figure  that  the  intersection  of  these  planes  is  a  line,  also  passing 
through  the  optical  centre.  Each  plane  appears  as  a  line  in  the  image:  the  intersection  of  the  planes  appears 
as  a  point,  p,  which  is  the  image  of  the  point,  P,  at  which  the  bitangent  planes  intersect  the  axis  of  symmetry. 


Figure  16:  Five  perspective  images  of  a  surface  of  revolution  at  different  inclinations.  The  invariant  values  are 
given  in  table  4. 


collinear  in  space,  all  lying  on  the  axis  of  rotational  symmetry).  In  this  manner  a  projective  invariant 
(the  cross- ratio)  is  associated  with  the  surface  [20,  35,  45]. 

The  construction  extends  to  straight  homogeneous  generalised  cylinders  (SHGCs)  [3,  35].  Again,  the 
intersections  of  corresponding  profile  bitangents  correspond  to  a  viewpoint-invariant  3D  point,  and  are 
collinear,  so  cross-ratios  can  be  formed.  Figure  16  shows  images  of  a  surface  of  revolution  with  the 
calculated  invariants  given  in  table  4. 

Correspondence  and  Grouping  As  described  above,  the  profile  of  a  rotationally  symmetric  surface 
can  be  separated  into  two  “sides”,  which  are  related  by  a  planar  harmonic  homology,  T.  There  are  a 
number  of  consequences  of  this  result: 

1.  The  two  sides  of  the  profile  can  be  grouped  by  associating  curves  which  are  projectively  equivalent. 
For  example,  by  matching  projectively  equivalent  concavity  curves.  This  correspondence  can  be 
achieved  automatically  by  the  planar  recognition  system  described  in  section  2. 

2.  If  the  projective  transformation  between  two  projectively  related  curves  is  not  a  harmonic  homology, 
then  the  grouped  curves  can  be  ruled  out  as  arising  from  a  surface  of  revolution.  This  is  simply 
tested  by  checking  if  =  I. 
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angle 

cross-ratio 

length  ratio 

45.0 

0.486187 

1.40862 

40.0 

0.490561 

1.98153 

35.0 

0.486796 

2.14017 

25.0 

0.486640 

2.38409 

15.0 

0.486260 

2.70539 

0.0 

0.494849 

4.13687 

Table  4:  Stability  of  invariants  for  a  surface  of  revolution.  The  invariants  are  computed  from  measured  points. 
The  angle  is  the  inclination  of  the  axis  of  the  lamp-base  to  the  camera  plane  (figure  16).  Typical  affine  (length 
ratio)  and  projective  (cross-ratio)  invariants  are  shown.  Note  that  the  value  of  the  affine  invariant  change  at 
extreme  angles,  whereas  to  two  significant  figures  the  projective  invariant  remains  stable  to  the  second  decimal 
place. 

3.  Under  real  imaging  conditions  the  transformation  T  relating  the  two  sides  of  a  profile  will  be  close 
to  affine.  This  quasi-invariant  condition  can  be  used  in  two  ways:  first,  lines  joining  corresponding 
points  on  the  two  sides  of  the  profile  will  be  almost  parallel.  Second,  relative  (not  scalar)  affine 
invariants  can  be  used  to  match  concavity  curves  [43]. 

4.  T  provides  point  to  point  correspondence  between  the  sides  of  the  profile,  this  can  be  used  to 
disambiguate  bitangent  matches.  This  correspondence  can  be  used  to  repairmissing  profile  portions, 
filling  in  gaps  by  transforming  over  points  from  the  other  side  of  the  profile. 

5.  The  projected  axis  can  be  determined  directly  from  the  projectivity  as  a  line  of  fixed  points  of  the 
homology  [64]. 

To  illustrate  the  power  of  this  grouping  constraint,  figure  17  shows  an  image  with  many  surfaces  of 
revolution  of  various  types  and  sizes.  The  matched  concavities  are  partitioned  into  sets,  and  the  profile 
curves  corresponding  to  each  set  are  grouped.  The  entire  process  is  automatic  and  relies  only  on  the 
properties  of  the  homology  between  symmetrical  portions  of  the  profile. 

3.6  Canal  Surfaces 

A  canal  surface  is  the  parallel  surface  of  a  space  curve.  It  is  the  locus  of  points  which  are  a  fixed 
perpendicular  distance  from  the  curve.  Equivalently  it  can  be  generated  as  the  envelope  of  a  sphere 
swept  with  the  centre  on  the  curve.  Common  examples  are  pipes  or  tubes  such  as  occur  in  plumbing.  In 
the  following  we  consider  canal  surfaces  for  which  the  generating  curve,  a,  is  planar.  For  such  surfaces 
we  have: 

Under  general  viewing  conditions,  an  inflection  in  the  generating  curve  gives  rise  to  two 
inflections  in  the  profile,  one  on  either  “side”.  The  tangents  on  the  contour  generator  at  the 
pre- images  of  the  profile  inflections,  and  the  tangent  at  the  generating  curve  inflection,  are 
parallel. 

The  consequence  of  this  is  that  tangents  at  the  paired  profile  inflections  intersect  in  the  vanishing  point 
of  the  generating  curve  inflection  tangent.  This  vanishing  point  lies  on  the  vanishing  line  of  the  plane  of 
the  canal  surface  generating  curve.  This  is  illustrated  in  figure  18a.  Note  that  a  straight  line  is  simply  a 
degenerate  inflection,  so  invariants  can  be  obtained  from  a  piecewise  linear  generating  curve. 

Computing  Invariants  The  canal  surface  is  the  envelope  of  spheres,  and  the  canal  profile  the 
envelope  of  sphere  profiles  [61].  Under  affine  imaging  conditions,  provided  the  image  has  the  correct 
aspect  ratio  (scaled  orthographic  projection),  the  sphere  profile  is  a  circle,  and  the  sphere  centre  projects 
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Figure  17:  (a)  Original  image  containing  several  surfaces  of  revolution,  (b)  The  linked  edges  computed  from 
(a),  (c)  Extracted  surface  of  revolution  profiles  with  axes  computed  automatically  using  grouping  constraints 
based  on  a  harmonic  homology,  (d)  Extracted  surface  of  revolution  profiles  and  axes  superimposed  on  the 
original  image. 


Figure  18:  For  a  canal  surface  with  a  planar  axis,  (a)  inflections  in  the  profile  occur  in  pairs  for  each  inflection 
of  the  axis.  The  intersection  of  a  pair  of  inflection  tangents  determines  the  vanishing  point  of  the  tangent 
line  at  the  axis  inflection.  Two  such  vanishing  points  determine  the  vanishing  line,  loo.  of  plane  of  the 
axis;  (b)  corresponding  profile  tangents  (profile  points  arising  from  the  same  surface  circular  cross-section) 
also  intersect  on  loo.  Their  intersection  point  is  the  vanishing  point  of  the  corresponding  axis  tangent  line. 
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d  e  f 

Figure  19;  Affine  normalisation  of  canal  surface  symmetry  sets.  Each  row  shows  two  images  of  the  same  pipe, 
and  the  symmetry  set  from  these  views  and  others  transformed  to  an  affine  canonical  frame.  The  canonical 
frames  (c)  and  (f)  contain  symmetry  sets  generated  from  thirteen  and  five  images  respectively.  At  least  half 
of  each  set  show  significant  perspective  distortions.  Note  the  variation  in  pipe  width  in  the  middle  column 
due  to  perspective.  Affine  (as  opposed  to  projective)  normalisation  can  be  achieved  because  the  vanishing 
line  of  the  plane  of  the  generating  curve  is  known  (It  is  computed  using  the  construction  of  figure  18a).  The 
canonical  frame  curves  are  clearly  very  stable  against  variation  in  viewing  position.  Moreover,  different  pipes 
can  be  distinguished  based  solely  on  this  affine  representation  [51].  The  slight  instability  present  towards  the 
ends  of  the  canonical  frame  curves  are  due  to  errors  in  the  extracted  symmetry  set  which  occur  where  the  pipe 
radius  changes. 


to  the  circle  centre.  The  circle  centre  can  be  recovered  from  the  symmetry  sef  of  the  canal  profile, 
which  is  thus  projectively  related  to  the  generating  curve,  a.  This  relation  is  exact  under  aflSne  imaging 
conditions,  and  is  an  extremely  good  approximation  under  perspective  with  a  realistic  field  of  view 
—  another  example  of  a  quasi-invariant.  Consequently,  invariants  computed  for  the  symmetry  set  are 
invariants  of  the  generating  curve.  For  example,  invariants  can  be  computed  from  measurements  on  the 
symmetry  set  curve  in  a  canonical  frame  in  a  similar  manner  to  the  ‘footprints  of  Lamdan  ei  el.  [34]. 
Figure  19  shows  examples  of  such  curves. 

Correspondence  and  Grouping  As  in  the  case  of  bilateral  symmetry,  the  constraint  of  a  canal 
surface  with  a  planar  generating  curve  establishes  a  planar  projective  constraint  in  the  image.  In  this 
case  two  vanishing  points  determine  the  vanishing  line,  loo,  of  fh®  plane  containing  the  generating  curve. 
Subsequently,  inflections  on  the  profile  can  be  paired  by  the  intersections  of  their  tangents  on  this  van¬ 
ishing  line.  Furthermore,  it  can  be  shown  that  this  intersection  constraint  holds  for  all  corresponding 
profile  points,  i.e.. 

Corresponding  profile  tangents  (i.e.,  points  whose  pre-image  is  on  the  same  circular  cross- 
section)  intersect  on  loo- 

See  figure  18b. 

^The  symmetry  set  is  the  locus  of  centres  of  circles  bitangent  to  a  plane  curve.  It  is  studied  in  detail  by  Giblin  & 
Brassett  [23]. 
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Under  affine  imaging  conditions  the  two  sides  of  the  profile  are  parallel  curves  of  the  symmetry  set 
(the  projection  of  the  generating  curve).  This  follows  directly  from  the  profile  curves  being  the  envelope 
of  constant-radius  circles  swept  along  the  symmetry  set. 

3.7  Polyhedra 

Recovering  the  structure  of  polyhedral  objects  from  a  single  view  has  been  widely  explored,  with  the 
most  detailed  study  appearing  in  [67].  In  this  work,  Sugihara  shows  that  the  incidence  equations  between 
polyhedral  vertices  and  faces,  observed  in  the  image,  lead  to  a  linear  system  of  equations  in  the  coefficients 
of  the  polyhedron’s  faces  and  image  observations. 

The  equations  in  this  system  are  incidence  equations  for  vertices  of  the  polyhedron  incident  on  plane 
faces.  In  particular,  given  vertex  V,*  =  (X,-,  Vi,  Zi)  lying  on  face  Fj  =  {Aj ,  1),  it  must  be  the  case 

that 

AjXi  +  BjYi  +  CjZi  +  l=zO. 

Assume  that  the  camera  image  plane  is  the  plane  Z  =  1  and  the  focal  point  is  at  (0,0,0);  these  assump¬ 
tions  can  be  accounted  for  by  the  geometric  ambiguity  in  the  reconstruction.  Then  V,-  projects  to  image 
point  (ui.Vi)  =  (XijZi.YilZi).  If  vertex  also  lies  on  face  F^,  we  can  divide  the  incidence  equations 
by  Zi  and  subtract  to  eliminate  1/^i,  obtaining: 


{Aj  -  Ak)ui  -h  {Bj  -  Bk)vi  -f  (Cj  -  Ck)  =  0 

where  Ui  and  Vi  are  known,  and  the  coefficients  of  the  planes  are  unknowns.  This  system  of  equations 
always  has  at  least  a  three  dimensional  family  of  solutions,  corresponding  to  a  polyhedron  where  all 
faces  are  the  same  plane  (since  a  plane  figure  has  three  degrees  of  freedom  in  3D  space).  If  the  family 
of  solutions  is  four  dimensional,  then  a  generic  element  of  the  family  is  a  system  of  planes  that  is 
projectively  equivalent  to  the  faces  of  the  original  polyhedron.  This  case  holds  when  the  reconstruction 
of  the  polyhedron  can  not  be  made  impossible  by  a  small  shift  of  the  vertices,  that  is,  is  ^‘position  free” 
in  the  terminology  of  Sugihara,  and  many  or  most  of  the  visible  faces  have  at  least  four  vertices  per  face. 
Although  this  is  by  no  means  a  generic  polyhedron,  it  is  a  useful  case  because  many  human  artifacts  satisfy 
these  constraints.  Given  the  added  assumption  that  vertices  are  trihedral,  it  is  possible  to  reconstruct 
faces  for  which  only  two  edges  are  visible;  thus,  on  viewing  a  cube,  all  six  faces  can  be  recovered.  This 
leads  to  a  novel  formulation  of  the  aspect  graph  idea,  where  substantively  fewer  aspects  are  necessary  for 
effective  representation.  The  case  where  the  polyhedron  consists  only  of  triangular  faces  is  equivalent  to 
an  unconstrained  set  of  points.  That  is,  a  set  of  points  can  always  be  triangulated  to  form  a  polyhedron. 
As  in  the  case  of  a  general  point  set  (section  3),  vertex  positions  are  unconstrained  by  the  image  view 
and  no  invariants  can  be  constructed. 

Computing  Invariants  Assuming  an  uncalibrated  camera,  it  can  be  shown  [59,  60]  that  for  polyhedra 
that  lead  to  a  system  of  equations  having  a  four  dimensional  solution  space  (such  as  cubes)  any  solution 
of  this  system  is  projectively  equivalent  to  the  original  (Euclidean)  polyhedron.  Consequently,  projective 
invariants  of  the  solution  are  the  same  as  those  measured  on  the  original  polyhedron. 

Correspondence  and  Grouping  Approaches  to  grouping  and  correspondence  for  this  class  are 
well  established  from  the  decade  or  so  of  blocks  world  vision  research.  The  main  basis  for  grouping 
is  topological,  where  one  seeks  to  construct  a  complete  polyhedral  structure  with  consistent  incident 
relations  between  vertices,  edges  and  faces. 

For  particular  sub-classes  of  polyhedra,  and  for  particular  eispects,  further  constraints  are  available. 
For  example,  a  cube  has  three  major  directions  which  define  a  triple  of  vanishing  points  in  the  image. 
All  edges  aligned  with  a  major  direction  must  pass  through  the  same  vanishing  point.  Similar  incidence 
constraints  apply  to  any  polyhedra  projectively  equivalent  to  a  cube.  Constraints  of  this  type  can  be 
used  to  extract  a  polyhedral  wireframe  from  a  polyhedral  silhouette,  inferring  internal  boundaries. 
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Figure  20:  An  extruded  surface  is  a  section  cut  from  a  general  cone  by  two  planes,  (a)  A  range  of  examples  of 
extruded  surfaces;  note  that  for  most  examples,  the  vertex  is  at  infinity,  (b)  The  top  and  base  image  curves, 
Cl  and  C2,  of  an  extruded  surface  are  related  by  a  particular  projective  transformation  T,  called  a  planar 
homology.  Corresponding  points  lie  on  lines  through  V,  which  is  the  fixed  point  of  the  transformation  (the 
centre  or  vertex).  The  line  L,  which  is  the  image  of  the  intersection  of  the  two  planes  that  cut  the  “cone”,  is 
a  line  of  fixed  points  of  the  transformation  (the  axis). 

3.8  Extruded  Surfaces 

An  extruded  surface  is  a  special  case  of  a  generalised  cylinder,  formed  by  a  section  cut  from  a  general 
cone  by  two  planes  (see  figure  20a)  in  such  a  way  that  the  section  of  surface  does  not  include  the  vertex 
of  the  cone[21].  This  is  the  projective  generalisation  of  a  surface  formed  by  a  system  of  parallel  lines, 
with  plane  ends  (such  a  surface  can  be  extruded  from  a  nozzle).  Extruded  surfaces,  and  surfaces  made 
up  from  extruded  components,  are  extremely  common  —  examples  include  most  tin  cans,  boxes,  books, 
and  many  plastic  bottles. 

Outline  Geometry  The  base  and  top  curve  are  perspectively  related  in  3D,  and  thus  related  in. 
the  image  by  a  projective  transformation,  T.  This  transformation  is  a  planar  homology  [64].  It  has  five 
degrees  of  freedom:  the  vertex  (2  dof),  axis  (2  dof)  and  the  cross  ratio  defined  by  the  vertex,  a  pair  of 
corresponding  points,  and  the  intersection  of  the  line  joining  these  points  with  the  axis  (1  dof).  The 
cross-ratio  is  the  same  for  all  points  related  by  the  homology®.  As  in  the  case  of  a  planar  harmonic 
homology  (figure  14)  a  planar  homology  has  a  line  of  fixed  points  and  a  fixed  point  not  on  this  line: 

1.  The  homology  vertex  is  the  projection  of  the  3D  cone  vertex.  It  is  the  fixed  point  of  T. 

2.  The  homology  axis  is  the  projection  of  the  line  of  intersection  of  the  top  and  base  planes.  It  is  the 
line  of  fixed  points  of  T. 

The  profile  curves  of  an  extruded  surface  are  a  pair  of  lines  which  intersect  at  the  image  vertex. 

Computing  Invariants  The  projective  geometry  of  an  extruded  surface  is  completely  defined  by 
three  elements:  a  plane  cross-section;  a  cone  vertex,  not  on  the  plane;  and,  a  line  in  the  plane.  The 
plane  cross-section  and  vertex  together  define  the  cone.  The  line  is  the  axis  of  the  pencil  of  planes  which 
intersect  the  cone  to  generate  the  top  and  base  curves.  These  elements  can  be  recovered  from  an  image 
of  the  surface  since  the  cross-section  of  the  cone  is  determined  up  to  a  projective  transformation  from  the 
imaged  base  curve  or  top  curve,  and  the  line  is  the  line  of  fixed  points  of  the  projective  transformation 
relating  top  and  base  image  curves. 


®  In  the  case  of  a  harmonic  homology,  the  cross- ratio  is  harmonic,  i.e.,  known,  so  there  are  only  four  remaining  degrees  of 
freedom.  The  sides  of  the  profile  of  a  surface  of  revolution  are  related  by  a  harmonic  homology,  as  described  in  section  3.5. 


In  essence,  the  invariants  of  an  extruded  surface  are  those  of  the  plane  cross-section  plus  an  extra  line 
in  the  plane,  obtained  from  intersection  of  the  base  and  top  planes.  Thus  extra  invariants  are  available 
over  the  plane  cross-section  alone.  For  example,  in  figure  20  a  five  line  invariant  can  be  computed  from 
the  image,  although  the  top  curve  only  contains  four  lines.  In  the  case  that  the  top  and  base  planes  are 
parallel,  affine  invariants  of  the  curve  can  be  measured  from  a  perspective  image. 

Correspondence  and  Grouping  As  described  above,  for  an  extruded  surface  the  top  and  base  image 
curves  are  related  by  a  planar  homology,  T.  Grouping  proceeds  by  finding  curves  which  are  projectively 
related.  The  class  assumption  can  then  be  tested  immediately  since  the  projective  transformation  must  be 
a  homology  if  the  curves  are  from  an  extruded  surface  (for  example,  two  of  the  eigenvalues  will  be  equal). 
The  homology  then  defines  the  vertex,  axis,  and  correspondence  for  the  surface,  which  is  used  for  further 
grouping.  Additionally,  since  the  surface  is  ruled,  and  all  rulings  pass  through  the  vertex,  the  intersection 
of  line  segments  in  the  profile  determines  the  imaged  vertex.  Similarly,  all  corresponding  point  pairs  on 
C  and  (figure  20b)  define  a  pencil  of  lines  which  pass  through  the  vertex,  and  corresponding  tangents 
intersect  on  the  line  of  fixed  points,  1, 

3.9  Algebraic  Surfaces 

Algebraic  surfaces  are  surfaces  for  which  a  single  polynomial  vanishes:  examples  include  spheres  (x^  -h 
j  ellipsoids  which  are  both  degree  two  surfaces  (quadrics),  and  a  wide  range  of  popular 

surfaces  in  modelling  such  as  rational  bicubic  patches.  Smooth  quadrics  are  all  projectively  equivalent 
(just  as  all  conics  are  projectively  equivalent)  so  that  there  are  no  projective  invariants  of  the  surface 
to  recover  from  images.  Although  a  single  quadric  does  not  have  any  projective  invariants,  two  or  more 
quadrics  do.  Similarly,  if  the  surface  has  degree  3  or  greater,  there  are  projective  invariants  to  recover 
from  images.  In  theory  these  invariants  can  be  recovered  from  the  surface  profile  alone  [22],  though  this 
has  not  been  implemented  in  practice. 

4  An  architecture  for  a  3D  recognition  system 

We  have  demonstrated  that  a  large  vocabulary  of  3D  invariants  can  be  derived  from  the  geometric 
constraints  associated  with  object  class  definitions,  e,g.,  that  of  a  surface  of  revolution.  In  general,  these 
curve,  surface  or  volume  class  constraints  enable  the  construction  of  invariants,  and  permit  at  least  partial 
reconstruction  of  the  3D  structure  from  a  single  perspective  view.  Class  constraints  also  provide  image 
feature  grouping  mechanisms  and  associated  indexing  machinery. 

The  work  to  date,  however,  has  focused  on  the  derivation  of  invariants,  structure  recovery,  and 
grouping  for  single  object  classes.  Experimental  validation  has  been  restricted  to  isolated  objects  of  a 
given  class  against  an  uncluttered  background.  An  important  next  step  is  to  integrate  the  approaches 
which  have  been  developed  into  a  unified  3D  object  recognition  system.  It  is  only  in  the  context  of  a  full 
system  that  the  effectiveness  of  a  class-based  invariant  representation  for  recognition  can  be  convincingly 
demonstrated. 

4.1  Fundamental  Principles 

Object  recognition  should  be  based  on  3D  geometric  descriptions,  both  of  objects  and  of  the  relationships 
between  objects.  To  date,  systems  have  largely  ignored  these  relationships;  as  we  show  below,  requiring 
consistency  in  inter-object  relationships  yields  substantial  information.  In  the  architecture  we  describe, 
this  information  is  encapsulated  in  an  internal  database,  known  as  the  scene.  The  scene  provides  a 
working  reconstruction  against  which  hypotheses  can  be  checked  to  provide  immediate  detection  of  a 
false  recognition  hypothesis.  For  example,  if  two  objects  are  hypothesised  in  such  a  way  that  one  must 
be  wholly  occluded  by  the  other,  then  at  least  one  of  the  object  hypotheses  must  be  wrong. 

Central  to  the  architecture  is  efficient  management  of  control  of  each  level.  Even  for  relatively  small 
images,  vast  numbers  of  hypotheses  for  feature  correspondences  and  model  interpretations  can  be  con¬ 
structed.  It  is  impossible  to  explore  all  avenues  of  interpretation,  so  some  basis  must  be  established  for 


28 


scheduling  feature  combination,  hypothesis  generation  and  verification  of  hypotheses.  The  priority  of 
scheduling  should  be  based  on  a  tradeoff  between  the  cost  and  the  benefit  of  a  computation. 

Finally,  class  pervades  the  architecture,  influencing  segmentation,  grouping,  indexing,  and  hypothesis 
confirmation. 

4.1.1  Class 

The  idea  that  objects  should  be  organised  in  a  taxonomy  and  classified  before  proceeding  to  recognition 
is  a  natural  and  well-accepted  principle.  The  problem  with  this  philosophy  is  that  many  ontological 
distinctions  are  not  manifested  in  observable  properties,  for  example,  the  difference  between  a  hollow 
container  and  a  solid  block.  Our  geometric  approach  to  object  classification  is  based  directly  on  visible 
features;  its  main  strength  is  that  it  is  not  vested  in  abstract,  philosophical  differences,  so  much  as  in 
image  observable  distinctions.  Object  class  has  its  most  important  effects  in  considering  feature  grouping 
and  the  structure  of  the  modelbase. 

Class  drives  grouping,  as  opposed  to  the  usual  *^heuristics”  that  are  used  to  associate  image  features. 
Each  object  class  defines  a  grouping  mechanism  based  on  its  image  invariant  relation.  For  this  reason, 
object  class  is  typically  settled  at  an  early  stage  in  the  grouping  process,  and  identity  emerges  only  after 
modelbase  access.  For  example,  there  is  no  point  in  grouping  lines  into  faces,  as  required  for  polyhedral 
class  grouping,  to  recognise  a  rotationally  symmetric  object.  A  rotational  symmetry  hypothesis  requires 
image  curves  related  by  a  planar  harmonic  homology.  The  projective  matching  of  these  curves  can  be 
carried  out  by  computing  and  matching  projective  invariants  of  the  curves.  This  is  an  application  of 
planar  object  recognition  techniques  within  a  single  image. 

Class  determines  the  access  functions  and  partitioning  of  the  modelbase.  The  modelbase  itself  is  a 
collection  of  facts  about  objects  and  their  properties.  These  facts  must  be  organised  in  such  a  way  as 
to  allow  easy  retrieval;  a  hashing  mechanism  is  appropriate.  By  the  time  the  modelbase  is  accessed, 
the  object  group  will  contain  a  strong  implicit  hypothesis  about  object  class  —  for  example,  a  pair  of 
concavity-curves  cannot  be  passed  to  the  polyhedral  hashing  mechanism.  In  fact,  the  modelbase  can  be 
viewed  as  a  rather  conventional  database,  organised  to  answer  certain  queries  very  efficiently. 

4.1.2  Consistency 

Consistency  tests  arise  from  computing  and  representing  relationships  between  objects.  To  date,  there 
have  been  few  “hard”  geometric  consistency  tests  for  inter-object  relations.  In  fact,  strong  geometric 
tests  emerge  from  the  observation  that  objects  share  the  same  Euclidean  frame  and  the  same  camera. 
These  tests  make  it  possible  to  recover  the  Euclidean  identity  of  objects  even  if  the  calibration  of  the 
camera  is  initially  unknown. 

Suppose  that  models  are  Euclidean  (i.e.,  the  relation  between  the  model  and  object  is  an  Euclidean 
transformation,  as  opposed  to  the  projective  transformation  of  the  2D  recognition  system),  and  recog¬ 
nition  hypotheses  have  been  formed  for  a  number  of  objects.  Even  though  the  camera  is  uncalibrated, 
the  Euclidean  consistency  of  the  recognition  hypotheses  can  be  tested  by  a  comparison  of  the  set  of  ray 
cones  from  the  optical  centre  to  each  object. 

The  cones  are  determined  from  P,  the  rank  three,  3x4  projection  matrix  of  equation  (2).  Given  a 
hypothesised  object,  P  is  determined  from  the  known  3D  Euclidean  geometry  and  the  image  features  by 
standard  resectioning  (as  in,  for  example  [56]).  Partitioning  P  as  P  =  [M|  -  Mt]  [29],  then  t  is  the  optical 
centre  which  is  the  null  space  of  P.  A  cone  of  rays  from  the  optical  centre  to  other  Euclidean  objects  in 
the  scene  can  then  be  constructed. 

If  the  hypotheses  for  each  object  are  correct,  the  ray  cones  for  each  object  should  be  Euclidean 
equivalent.  That  is,  there  will  be  a  rotation  about  the  optical  centre  which  superimposes  the  cones  for 
each  object  from  each  hypothesis.  Thus,  inconsistent  hypotheses  can  be  detected  by  the  failure  of  this 
test  and  the  goal  is  to  build  up  the  largest  pairwise  consistent  set  of  object  hypotheses.  An  example  of 
hypothesis  labelling  and  reconstruction  is  shown  in  figure  21. 

Another  consistency  test  involves  decomposing  the  matrix  M  (above)  as  N  =  KR  by  QR  decomposi¬ 
tion  [25],  where  R  is  a  rotation  matrix,  and  K  an  upper  triangular  matrix  containing  the  intrinsic  param¬ 
eters  of  the  camera.  Each  hypothesis  must  agree  on  the  camera  intrinsic  parameters  and  inconsistent 
hypotheses  can  be  detected  from  differing  decompositions  for  K. 
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Figure  21:  (a)  A  scene  containing  polyhedra  all  of  which  are  projectively  equivalent,  but  Euclidean  inequivalent, 
(b)  A  labelling  of  the  scene,  (c)  A  Euclidean  reconstruction  of  the  polyhedral  world  shown  in  (a).  The  optical 
centre  is  the  marked  point  in  the  top  right-hand  corner  of  the  figure. 


Given  an  image  containing  a  large  number  of  known  objects,  once  the  first  few  have  been  recognised 
and  used  to  construct  a  consistent  world  frame,  this  frame  can  be  accepted  and  used  to  prune  additional 
hypotheses  in  a  depth-first  search  for  consistency.  At  this  point,  rather  than  searching  for  consistent 
groups  of  object  hypotheses,  individual  hypotheses  can  be  tested  against  the  established  frame  with  little 
risk  of  error.  Furthermore,  if  this  frame  is  accepted,  then  it  can  be  used  to  condition  grouping  and 
indexing  activities.  The  Euclidean  reconstruction  of  the  world  forms  the  scene  database. 


4.2  The  Architecture 

Representation  is  organised  into  a  number  of  layers  as  illustrated  in  figure  22.  These  stages  of  represen¬ 
tation  are  not  very  different  from  other  recognition  architectures,  however  the  three  principles  of  c/ass, 
global  consistency  and  control  provide  a  unifying  theme. 

Segmentation  and  grouping  The  key  to  successful  recognition  is  efficient  and  robust  feature 
segmentation  and  grouping.  There  are  four  levels  of  image  feature  representation  and  grouping: 

Level  I:  Pixel-level  features  are  defined  with  respect  to  an  image  coordinate  system  and  reflect  the 
quantised  nature  of  pixel  coordinates.  Typically,  features  will  be  produced  using  an  edge  operator 
with  sub-pixel  accuracy,  and  the  resulting  edgels  linked  into  a  network  reflecting  the  topology  of 
the  image  boundaries. 

Level  II:  Geometric  features  curves  from  level  I  are  described  in  terms  of  geometric  primitives,  where 
appropriate.  For  example,  algebraic  curves  such  as  line  segments  and  conics,  smooth  curves,  and 
concavities  defined  by  bitangents. 

Level  III:  Generic  Grouped  features  This  level  of  grouping  is  applied  to  all  features  produced  at  level 
II.  The  output  is  a  number  of  groupings  and  databases  which  are  used  by  the  class-based  groupers 
described  below.  Generic  grouping  includes:  near  incidence  (jumping  small  gaps,  completing  cor¬ 
ners  and  junctions);  collinearity;  marking  bitangent  and  other  distinguished  points;  finding  sets  of 
parallel  line  segments;  affine  or  projective  equivalence  of  curve  segments  (e.g.,  concavities).  These 
relations  can  be  viewed  as  queries  to  a  spatially  organised  data  base.  For  example,  typical  queries 
might  be:  “what  other  lines  are  parallel  to  a  given  line  and  above  a  certain  length  in  the  region 
of  interest?”,  or  “what  other  lines  are  collinear  with  the  given  line  over  the  entire  image?”.  In 
the  current  design  there  is  no  attempt  at  enforcing  “backwards  compatibility”.  For  example,  if  a 
grouper  at  level  III  hypothesises  that  two  curves  should  be  joined  there  is  no  attempt  to  correct  the 
level  II  representation.  Ultimately,  it  may  be  important  to  ensure  such  consistency  between  levels. 
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Figure  22:  The  proposed  architecture  for  object  recognition  organised  around  geometric  classes  mth  associated 
grouping  and  indexing  methods. 

Level  IV:  Class-based  grouping  Each  class  has  an  associated  “class-based  grouper”  that  interrogates 
the  level  III  groupings  and  databases,  and  attempts  to  form  groups  appropriate  for  its  class.  The 
grouping  mechanism  is  based  on  the  image  invariant-relation  as  described  in  section  3  for  each  class. 
A  good  example  is  given  by  the  rotationally  symmetric  class  which  defines  a  grouping  constraint  in 
terms  of  the  harmonic  homology  between  the  corresponding  sides  of  the  profile  (section  3.5). 

In  addition  to  grouping,  such  constraints  can  be  used  to  repair  missing  portions  of  the  outline  due 
to  occlusion  or  poor  contrast.  For  example,  for  a  surface  of  revolution,  a  “snake”  or  deformable 
template  can  be  defined  by  one  side  and  applied  to  the  other  under  the  transformation  of  the 
homology.  The  transformation  between  both  sides  can  be  iterated  to  improve  the  geometric  cor¬ 
respondence  of  both  sides.  Such  class-based  snakes  can  also  augment  the  initial  edgel  extraction 
process.  Figure  23  shows  an  example  of  repair  and  augmentation,  where  a  polyhedral  class  snake 
recovers  poorly  defined  interior  edges  from  the  exterior  polyhedra  outline,  again  based  on  the  class 
constraints. 

Indexing  and  Hypothesis  Combination  The  groups  defined  by  each  class  also  define  the  indexing 
function  used  to  retrieve  specific  objects  from  the  modelbase.  For  example,  for  a  canal  surface  the  indexes 
are  computed  from  the  symmetry  set  of  the  profile,  for  a  surface  of  revolution  from  distinguished  points 
on  the  axis. 

Indexing  is  handled  by  a  series  of  hash-tables,  one  per  class,  that  take  the  invariants  of  a  set  of  grouped 
features  and  associate  with  them  models  in  the  modelbase.  For  complex  objects,  there  may  be  many 
feature  groups  that  index  to  the  object,  leading  to  a  situation  where  a  single  instance  could  generate  many 
recognition  hypotheses.  In  the  planar  recognition  system,  this  problem  is  handled  by  merging  consistent 
object  hypotheses  into  joint  hypotheses. 

Forming  joint  hypotheses  (cliques),  is  fairly  successful  for  small  numbers  of  feature  groups,  but  for 
more  complex  objects,  there  are  potentially  quite  substantial  combinatorial  problems.  However,  the 
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principle  that  feature  groups  belonging  to  the  same  object  should  accrete  into  a  more  complex  feature 
grouping,  is  a  good  one.  This  accretion  can  be  implemented  in  a  more  general  fashion  as  follows:  if  a 
feature  group  results  in  a  successful  indexing  attempt  to  a  relatively  small  number  of  models,  it  leaves  a 
record  of  that  attempt  in  an  image-scene  relational  data  structure.  When  another  feature  group  indexes 
to  the  same  object  or  list  of  objects,  and  is  within  some  grouping  horizon  of  the  first  group,  the  two 
feature  groups  can  be  associated  in  a  larger  feature  grouping,  based  on  their  correspondence  to  the  same 
object  structure.  To  make  this  record,  the  system  forms  a  collection  of  keys  out  of  the  image  feature 
position  and  each  possible  object  model  in  turn,  and  stores  a  unique  identifier  for  the  image  feature 
group  in  an  image-scene  database  using  these  keys.  The  storage  mechanism  is  such  that,  if  the  database 
is  queried  using  a  model  identity  and  feature  position,  it  will  return  any  image  features  that  indexed  that 
model  and  are  “near”  (for  some  horizon)  the  original  feature  group.  Note  that  other  forms  of  image- 
scene  information  could  be  used  in  addition  to  Euclidean  distance  in  the  image;  for  example,  an  indexing 
hypothesis  might  be  associated  with  a  pose  or  frame  hypothesis. 

Verification  In  the  planar  system,  two  stages  of  verification  were  used:  plausibility  of  the  projective 
transformation  taking  the  object  from  the  model  to  image  frame,  and  image  support  measured  by  the 
proportion  of  the  back-projected  model  perimeter  that  lines  up  with  image  features.  Such  verification, 
based  only  on  object  outline,  can  fail  through  accidental  correspondences  with  texture  (for  example, 
oriented  markings  such  as  wood  grain). 

To  avoid  this  problem,  verification  is  augmented  in  a  number  of  ways.  First,  surface  markings  and 
surface  texture  will  be  stored  for  each  object  in  the  model  library.  During  verification,  the  internal 
surface  properties  of  an  object  can  be  compared  with  the  properties  actually  observed  in  the  interior  of 
a  model  hypothesis.  Second,  the  reliability  of  verification  will  be  improved  by  scene  consistency  analysis. 
For  example,  if  one  object  is  deemed  to  be  behind  another  with  respect  to  a  given  camera  viewpoint, 
then  it  would  be  inconsistent  to  declare  a  large  portion  of  confirmed  boundary  for  the  occluded  object. 
More  generically,  the  “score”  for  a  hypothesis  is  improved  if,  when  portions  of  the  perimeter  cannot  be 
matched,  there  is  independent  evidence  of  an  occlusion  occurring.  For  example,  one  piece  of  evidence  for 
occlusion  is  that  aligned  “T”  junctions  occur  at  each  end  of  the  occlusion. 

The  modelbase  The  modelbase  will  be  organised  around  object  class.  For  each  class  there  will  be 
appropriate  hash-tables  for  indexing,  and  a  data  base  of  models.  For  example,  canal  surfaces  and  surfaces 
of  revolution  will  have  separate  indexing  tables,  containing  respectively  affine  and  projective  invariants, 
and  separate  model  libraries. 

In  the  2D  recognition  system  (section  2.3)  the  modelbase  typically  acted  as  a  passive  repository  that 
contained  geometric  models,  and  was  indexed  to  identify  an  object.  The  modelbase  can  be  more  powerful 
than  this.  It  can  also  store  aggregated  statistics  derived  from  all  the  models  in  each  class  library.  These 
can  be  used  to  improve  efficiency.  For  example,  suppose  the  maximum  number  of  undulations  of  any 
surface  of  revolution  in  the  library  is  stored.  Then  if  a  putative  profile  is  returned  by  the  grouper  which 
has  more  undulations  than  this,  there  is  no  point  computing  invariants  or  indexing.  Similarly,  if  there  are 
only  trihedral  vertices  for  any  polyhedra,  then  there  is  no  need  for  the  polyhedral  grouper  to  attempt  to 
group  or  index  with  four  concurrent  lines.  Exploiting  the  modelbcise  in  this  manner  can  greatly  strengthen 
the  performance  of  the  system. 

Objects  which  do  not  correspond  to  a  single  volumetric  primitive,  i.e.,  composite  [72]  objects,  will 
have  multiple  representations:  each  representation  covering  a  possible  image  segmentation  and  grouping. 
For  example,  a  mug  might  be  represented,  in  the  composite  3D  structure  class,  as  a  surface  of  revolution 
together  with  a  canal  surface  (the  handle).  Equally,  the  handle  could  be  represented  as  a  digital  plane 
curve,  and  the  mug  body  as  a  canal  surface  or  extruded  surface.  All  such  representations  will  be  included 
in  the  modelbase.  It  is  only  through  recognition  that  the  common  concept  “mug”  is  achieved. 

The  scene  An  additional  source  of  constraints  and  parameters  is  the  3D  scene,  which  can  also  be 
viewed  as  a  database  which  reflects  the  current  configuration  of  the  world  and  cameras.  It  provides  a 
representation  of  all  the  information  currently  available  about  the  common  Euclidean  frame  in  which 
objects  reside.  This  3D  spatial  layout  can  be  used  at  a  number  of  stages,  for  example,  to  determine 
occlusion  relations  amongst  model  hypotheses,  and  for  camera  viewpoint  consistency. 
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4.3  Model  acquisition 

Typically,  models  will  be  acquired  from  multiple  views  of  objects.  The  fact  that  such  models  can  serve  as 
sufficient  representations  for  recognition  is  a  major  advantage  of  the  invariance  approach  to  recognition. 
Our  goal  is  to  provide  a  model  acquisition  tool  which  permits  additions  to  the  model  library  to  be  as 
simple  as  providing  four  or  five  unoccluded  views  of  the  object.  This  goal  was  achieved  in  the  planar 
object  recognition  system,  where  only  one  or  two  unoccluded  views  were  required  to  construct  a  model. 

Of  course,  the  problem  is  more  difficult  for  3D  objects,  but  the  partitioning  into  classes,  based  on  ge¬ 
ometry,  will  permit  the  efficient  grouping  and  correspondence  construction  required  for  model  description. 
Since  the  object  classes  consist  of  3D  volumetric  primitives,  we  expect  that  only  a  small  number  of  views 
will  be  required  for  most  objects,  and  that  these  views  will  be  defined  by  the  extraction  of  a  sufficient  set 
of  stable  features  over  a  wide  range  of  viewpoints.  This  contrasts  with  the  construction  of  aspect  graphs 
based  on  topology  [65],  which  define  a  large  number  of  aspects,  or  distinct  views,  which  cannot  be  reliably 
distinguished  from  image  feature  groups.  That  is,  fine  topological  properties  of  an  image  feature  group 
are  unlikely  to  be  reliably  recovered  from  image  segmentation  and  grouping.  The  integrity  of  the  object 
boundary  topology  is  a  secondary  result  achieved  after  an  initial  recognition  hypothesis  has  occurred. 

Euclidean  information  in  object  models  is  required  for  scene  consistency  techniques  to  work.  One 
approach  is  to  derive  the  Euclidean  properties  from  self-calibrated  camera  views.  Three  or  more  general 
views  with  a  single  camera  are  sufficient  to  derive  internal  camera  parameters  [17,  30],  and  a  3D  scaled 
Euclidean  reconstruction  for  point  sets. 

For  manufactured  objects.  Computer  Aided  Design  (CAD)  models  can  be  used  to  provide  a  Euclidean 
description.  However,  it  should  be  noted  that  it  is  often  the  case  that  CAD  models  used  for  part  design 
do  not  necessarily  correspond  exactly  to  the  manufactured  version  of  the  part.  The  description  obtained 
from  imagery  is  more  ‘‘realistic”  and  incorporates  many  details  which  are  not  practical  to  include  in  a 
CAD  model,  such  as  filets  and  attachment  hardware.  Conversely,  certain  CAD  features  may  be  irrelevant 
in  practice  since  they  are  not  manifested  visually  in  any  image. 

On  the  other  hand,  it  is  important  to  develop  the  idea  of  Platonic  generalisation  of  a  model  description. 
For  example,  if  an  image  curve  is  sufficiently  straight,  it  can  be  interpreted  as  an  instance  of  an  “ideal”  line 
even  though  its  manifestation  as  an  image  feature  is  never  perfectly  straight.  Similarly,  a  pair  of  profile 
curves  may  match  closely  enough  to  be  considered  the  outline  of  a  rotationally  symmetric  object,  even 
though  they  are  not  perfect  instances  of  such  a  projection.  The  benefit  of  constructing  a  Platonic  ideal 
description  is  that  over  a  large  set  of  views  and  feature  reconstructions,  the  ideal  description  represents 
the  natural  mean  over  the  set  of  reconstructions.  Also,  the  Platonic  description  is  in  accordance  with  the 
formal  mathematical  constraints  used  to  drive  the  grouping  and  indexing  process. 

4.4  MORSE 

These  ideas  are  being  incorporated  into  a  system,  called  MORSE,  whose  implementation  is  currently 
underway.  MORSE  is  named  after  the  detective  character  originated  by  Colin  Dexter,  who  is  able  to 
ferret  out  truth  given  apparently  unpromising  evidence.  Our  earlier  system  for  2D  object  recognition 
is  called  LEWIS,  the  name  of  Morse’s  less  capable  assistant.  MORSE  will  provide  an  environment  for 
research  on  object  representation  for  recognition,  by  providing  a  context  in  which  issues  such  as  the 
distinctiveness  of  representations,  the  usefulness  of  feature  groups,  and  the  significance  of  consistency, 
can  be  addressed.  The  system  is  being  implemented  in  CH-+  using  a  class  hierarchy  based  on  the  Image 
Understanding  Environment  (lUE)^. 

The  current  state  of  progress  of  MORSE  is  illustrated  in  figure  23.  Class-based  groupers  have  been 
implemented  for  surfaces  of  revolution  (section  3.5),  canal  surfaces  (section  3.6),  and  polyhedra  (sec¬ 
tion  3.7).  In  each  case  the  grouping  is  based  solely  on  the  constraints  on  the  structure  of  the  profiles 
for  each  class.  Profiles  for  each  class  are  extracted  completely  automatically.  As  is  demonstrated  in  the 
figure,  recognition  proceeds  by  first  recognising  an  object  as  belonging  to  one  of  the  classes  (for  example 
a  surface  of  revolution).  Subsequently  the  object  will  be  identified  (for  example  as  a  particular  vase). 

’'The  lUE  is  an  ARPA  funded  project  to  produce  an  object-oriented  programming  environment  for  vision  research.  A 
central  object  hierarchy  in  the  lUE  is  the  apaiial-objeci  which  incorporates  many  of  the  descriptive  requirements  described 
in  the  previous  sections.  The  lUE  also  has  an  extensive  set  of  classes  for  object  and  image  transformations  which  are  a 
central  issue  in  MORSE. 
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Figure  23:  (a)  Original  image  containing  two  surfaces  of  revolution,  two  canal  surfaces,  and  two  polyhedra. 
(b)  The  linked  edges  computed  from  (a).  Profiles  are  extracted  and  grouped  automatically  from  these  linked 
edges  by  the  class-based  groupers,  (c)  Extracted  surface  of  revolution  profiles  with  axes.  Note  that  gaps  in  the 
edgel  chains  have  been  repaired  in  the  recovered  profile,  (d)  Extracted  canal  profiles,  (e)  Extracted  polyhedra 
outlines,  (f)  Extracted  profiles  superimposed  on  original  image.  All  the  correct  instances  of  a  class  have  been 
grouped,  and  no  false  instances  grouped. 


5  Discussion 

5.1  Critique  of  the  invariance  approach 

To  conclude,  it  is  useful  to  clarify  many  of  the  points  just  presented  by  responding  to  a  number  of  major 
criticisms  which  can  be  made  of  the  invariance  approach  to  recognition.  It  will  be  instructive  to  employ 
these  critical  points  as  a  benchmark  of  the  progress  in  recognition  which  can  be  attributed  to  the  invariant 
framework. 

1.  The  extreme  nature  of  projective  ambiguity:  Invariance  concentrates  on  projective  represen¬ 
tations.  In  practice,  perspective  distortions  in  images  are  small  and  so  can  be  ignored.  Furthermore, 
the  projective  equivalence  class  is  too  large  —  a  sphere  and  an  ellipsoid  are  in  the  same  class,  as  are 
a  cube  and  a  truncated  pyramid.  Thus,  the  recognition  system  can  not  distinguish  between  them. 

Response:  First,  we  have  demonstrated  with  the  planar  recognition  system  that  a  projective 
representation  is  sufficient  for  many  practical  examples.  Second,  although  it  is  almost  always  the 
c£tse  that  only  projective  structure  can  be  recovered  in  a  single  uncalibrated  image  of  an  object, 
this  does  not  mean  that  the  recognition  system  is  bound  to  projective  ambiguities.  For  example, 
for  certain  classes,  affine  or  similarity  invariants  can  be  measured  in  a  perspective  image,  e.g.,  a 
structure  repeated  by  translation  (section  3.4).  Euclidean  consistency,  section  4.1.2,  can  be  used  to 
reduce  ambiguity  from  projective  to  similarity  for  an  image  of  multiple  objects. 

2.  The  exclusiveness  of  geometry:  Invariance  at  present  concentrates  on  geometry  to  the  exclusion 
of  other  important  object  properties  that  should  be  used  in  a  recognition  system  such  as:  colour, 
texture  (e.g.,  wood  grain),  surface  markings  (e.g.,  pictures  or  lettering  on  a  can),  and  surface  prop¬ 
erties  (e.g.,  metal  vs  dielectric). 

Response:  Geometry  very  largely  dominates  object  descriptions  in  the  system  sketched  above. 
There  is  some  way  to  go  before  colour,  texture,  surface  markings  or  surface  properties  can  stand  on 
an  equal  footing  with  geometric  information.  These  are,  at  present,  measured  in  images  relatively 
unreliably  compared  to  geometry.  Nevertheless,  such  cues  fit  into  the  proposed  architecture.  For 
example  surface  markings  and  texture  can  be  used  as  additional  invariant  indexes  (see  section  5.2), 
and  could  certainly  be  used  as  additional  measures  during  verification. 

3.  The  lack  of  abstract  classification:  Invariance  does  not  address  the  problem  of  classification, 
only  identification.  In  a  typical  model-based  system,  the  class  to  which  an  object  belongs  can  be 
determined  only  by  recognising  it  as  a  specific  object.  For  example,  an  unknown  object  might  be 
identified  as  a  “1991  Red  Mazda  323  Hatchback”,  which  is  a  member  of  the  class  “car”.  This  class 
membership  is  determined  by  subsequent  reasoning:  it  cannot  be  directly  identified  as  a  car,  as 
distinct  from  a  fish,  despite  the  differences  between  the  two  classes. 

Response:  Abstract  classification  in  its  broadest  sense  presents  severe  conceptual  and  philo¬ 
sophical  problems,  which  we  have  carefully  avoided  auldressing.  Until  it  is  possible  to  address  these 
problems  concretely,  by,  for  example,  stating  exactly  what  a  program  that  distinguished  between  a 
general  fish  and  a  general  bicycle  would  do,  they  will  be  difficult  to  solve.  However,  the  architecture 
proposed  contains  a  first  step  in  this  direction,  by  distinguishing  between  classes  of  object  on  the 
basis  of  the  techniques  required  to  construct  representations  from  images.  In  particular,  if  a  group 
of  edge  segments  is  classified  as,  for  example,  the  profile  of  a  rotationally  symmetric  object,  tech¬ 
niques  exist  for  confirming  that  classification  (in  this  case,  by  determining  that  there  is  a  projective 
equivalence,  T,  on  the  profile,  such  that  =  I). 

4.  The  rigidity  of  exact  geometry:  Geometry  is  not  the  appropriate  language  to  represent  objects 
such  as  clothes,  plants  and  animals,  which  can  articulate  and  deform.  A  deformable  template,  or 
even  non-geometric  descriptions,  such  as  a  set  of  colour  histograms,  may  be  much  more  effective  in 
representing  such  objects. 
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Response:  It  is  not  yet  clear  what  representations  one  would  want  to  extract  for  objects  that 
have  no  clearly  defined  geometry  (for  example,  what  aspects  distinguish  one  shirt  from  another?). 
As  a  result,  exact  geometry  is  probably  going  to  dominate  recognition  for  some  time  to  come.  How¬ 
ever,  there  is  clearly  a  need  for  a  hybrid  of  geometric  invariance  and  statistics  for  certain  classes 
of  deformation;  invariance  would  allow  for  change  of  viewpoint,  while  statistics  would  cover  the 
deformation, 

5.  Complex  objects:  Invariants  might  be  suitable  for  representing  and  recognising  “simple”  parts, 
such  as  surfaces  of  revolution  or  quadrics  ( “geons” )  but  do  not  yet  cover  assembling  these  shapes 
into  complex  objects,  such  as  telephones  or  aeroplanes. 

Response:  Some  of  the  3D  classes,  for  example,  surfaces  of  revolution  or  canal  surfaces,  are 

“simple”  —  essentially,  little  more  than  plane  curves.  This  does  not  affect  their  usefulness  in  rep¬ 
resenting  a  large  number  of  real  objects.  Other  classes  of  objects  are  more  genuinely  3D  objects 
for  example,  repeated  structures  or  polyhedra. 

Objects  consisting  of,  for  example,  a  hierarchy  of  parts,  are  not  explicitly  addressed  in  this  approach. 
However,  the  seeds  of  a  solution  are  present  in  the  use  of  geometric  relations  between  feature  groups 
(intra-object  invariants),  such  as  those  shown  in  figure  3,  in  forming  joint-recognition  hypotheses. 
One  could  advance  the  system  architecture  above  to  do  the  same  thing,  recognising  parts  individ¬ 
ually  and  then  using  projective,  or  Euclidean  information  about  object  pose  that  results  from  the 
consistency  checking,  to  determine  whether  components  lie  in  such  a  way  as  to  make  up  a  composite 
object. 

Concerning  segmentation,  it  may  be  the  case  that  the  grouping  relations  defined  for  each  class 
provide  a  natural  means  of  segmenting  a  complex  outline  into  primitive  volumetric  parts.  For 
example,  when  the  harmonic  homology  on  the  profile  ceases  to  apply  this  would  indicate  that  the 
surface  of  revolution  part  had  finished. 

Of  course,  there  will  be  objects,  e.g.,  potatoes,  that  can  not  be  represented  by  a  combination  of 
the  classes  described  here.  However,  such  generic  shapes  are  currently  difficult  to  distinguish  with 
any  representation  under  the  distortions  of  perspective  imaging.  Our  view  is  that  it  is  better  to 
proceed  with  a  set  of  classes  which  can  support  reliable  recognition  and  establish  a  benchmark  of 
performance  for  future  systems  to  build  on. 

5.2  Avenues  of  future  research 

Indexing  allows  fast  recognition  of  objects  drawn  from  a  diverse  collection  of  classes:  a  range  of  specific 
techniques  for  recovering  the  projective  invariants  necessary  for  indexing  various  object  classes  has  been 
displayed  and  demonstrated.  These  ideas  have  been  integrated  to  produce  a  recognition  system  archi¬ 
tecture  that  should  be  capable  of  handling  large,  diverse  modelbases,  and  that  addresses  many  of  the 
concerns  recently  raised  about  indexing  in  recognition  systems. 

However,  object  recognition  is  not  yet  “solved.”  There  are  a  range  of  avenues  of  research  that  promise 
exciting  developments;  we  indicate  a  few  topics  most  interesting  to  us: 

•  The  role  of  quasi- invariants:  Using  quasi- invariants  for  indexing  is  a  problem,  because  of  the 
cost  incurred  if  the  “wrong”  object  is  indexed  by  a  quasi-invariant  applied  outside  its  domain  of 
stability.  Avoiding  this  requires  complex  hypothesis  combination  to  ensure  the  “right”  object  is 
indexed.  The  benefit  of  quasi-invariants  is  the  use  of  simpler  feature  groups  at  the  start  of  the 
recognition  process.  However,  simple  feature  groups  are  often  not  very  discriminating.  Instead,  we 
propose  that  quasi- invariants  should  be  used  to  schedule  grouping.  The  quasi-invariants  identified 
in  this  paper  (e.g.,  the  relation  between  sides  of  a  surface  of  revolution  profile),  can  be  used  to 
schedule  promising  groups  for  further  growth. 

•  Learning  invariants:  invariant  indexes  are  a  good  goal  for  a  learning  algorithm;  an  ideal  algorithm 
would,  given  a  large  model  base,  determine  by  some  offline  process  of  generating  views,  the  functions 
and  image  features  most  useful  in  indexing  models  effectively.  Alternatively,  invariants  could  be 
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extracted  from  a  large  number  of  real  images  of  an  object  taken  from  varying  viewpoints.  The 
advantage  of  the  latter  is  that  the  invariant  descriptors  would  only  involve  features  that  could  be 
reliably  measured  in  images. 

•  The  use  of  texture  and  surface  markings:  Clearly,  texture  and  surface  markings  have  a  part  to 
play  in  verification.  However,  surface  markings,  together  with  the  profile  of  certain  surface  classes, 
can  be  used  to  generate  further  projective  invariants.  For  example,  by  facilitating  the  backprojection 
of  the  markings  onto  the  surface  for  that  class.  These  marking  invariants  can  augment  indexes 
based  on  the  object  profile  alone.  For  example,  without  surface  markings,  quadrics  are  projectively 
equivalent,  but  four  points  (markings)  on  a  quadric  surface  have  two  projective  invariants  in  space, 
which  can  be  recovered  from  a  single  image. 

•  Extensions  to  Grouping  Computation:  In  recent  experiments  with  control  for  grouping  fea¬ 
tures  for  rotationally  symmetric  objects  and  repeated  structures,  the  idea  of  synchrony  in  edgel 
curve  and  line  segment  linking  has  emerged.  For  example  in  exploring  the  topological  links  along 
the  occluding  boundary  of  a  rotationally  symmetric  object,  it  should  be  possible  to  use  the  con¬ 
straints  of  the  planar  homology  to  control  the  linking  sequence.  In  a  complex  scene  with  many 
possible  edgel  chain  connections,  these  constraints  will  considerably  reduce  the  number  of  feasible 
paths  generated  for  symmetrical  association.  Once  a  single  concavity  is  determined,  the  rest  of 
the  boundary  can  be  recovered  by  a  synchronised  edge  following  algorithm.  As  new  parts  of  the 
boundary  are  confirmed,  the  homology  transform  parameters  can  be  iteratively  refined. 

The  same  type  of  strategy  can  be  followed  for  any  geometric  class  based  on  symmetry  or  structural 
repetition.  The  constraints  inherent  in  these  classes  can  be  extended  right  down  to  the  edgel  linking 
stage.  Such  an  approach  is  currently  being  implemented. 
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Is  Epipolar  Geometry  Necessary  to  Recover  Invariants  from  Multiple 

Views? 

Andrew  Zisserman,  Richctrd  L  Hartley,  Joe  L.  Mundy  and  Paul  Beardsley 


Abstract 

We  examine  and  contrast  the  projective  properties  of  two  simple  3D  configurations.  The  first 
consists  of  six  points,  four  of  which  are  coplanar.  We  prove  that  epipolar  geometry  and  the  essential 
matrix  can  be  recovered  uniquely  for  this  structure  and  give  a  constructive  algorithm  for  this.  The 
second  configuration  has  four  coplanar  points  and  a  single  non*coplanar  line.  In  this  case  it  is 
not  possible  to  determine  the  epipolar  geometry.  However,  both  structures  have  two  projective 
invariants,  and  these  aire  recoverable  from  two  (uncalibrated)  perspective  images.  We  include 
examples  of  the  inveuriants  for  real  objects. 


1  Introduction 

A  number  of  recent  papers  have  demonstrated  that  vision  tasks  such  as  recognition  and  structure  recov¬ 
ery  can  be  accomplished  using  only  projective  properties  [4,  7, 14].  This  contrasts  with  “conventional” 
approaches  where  full  Euclidean  reconstruction  of  3-space  is  sought.  One  of  the  immediate  advantages 
of  the  projective  approach  is  that  no  camera  calibration  is  required.  The  intrinsic  parameters  need 
not  be  known  since  only  projective  properties  of  the  rays  (not  angles)  are  used. 

Given  two  perspective  images  of  particular  3D  configurations  and  assuming  only  image  feature 
correspondences,  we  consider  the  following  questions: 

1.  Can  the  epipolar  geometry  of  the  two  cameras  be  uniquely  recovered? 

2.  Can  projective  invariants  of  the  3D  structure  be  computed? 

(So  called  “multiple  view  invariants”). 

The  invariants  sought  will  be  to  transformation  by  the  projective  group  (i.e.  multiplication  of  the 
homogeneous  representation  of  3-space  by  an  arbitrary  non-singular  4x4  matrix). 

If  the  epipolar  geometry  is  known,  then  3D  structure  can  be  recovered  up  to  3D  collineation  i.e.  up 
to  an  arbitrary  projective  transformation  [4,  7].  Consequently,  invariants  to  this  transformation  can 
be  computed  from  the  recovered  structure  (since  they  are  unaffected  by  the  projective  transformation 
relating  the  recovered  and  “true”  Euclidean  configurations).  Here  we  examine  cases  where  multiple 
view  invariants  can  be  obtained  in  the  absence  of  epipolar  geometry.  In  particular  we  contrast  two 
structures: 

1.  Four  coplanar  points,  and  two  non-coplanar  points.  The  non-coplanar  points  must  be  in  “general 
position”.  This  is  made  more  precise  below. 

2.  Four  coplanar  points,  and  a  non-coplanar  line. 

The  essential  benefit  of  the  four  coplanar  points  is  that  they  define  a  projective  basis  for  the  plane 
which  can  be  used  to  transfer  [1,  12]  coordinates  between  the  world  plane  and  images.  Any  other 
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Structure  (5) 

dim  5 

dim  Gx 

#invar 

6  points  general  position 

18 

0 

3 

7  points  general  position  (*) 

21 

0 

6 

5  points,  4  coplanar 

14 

1 

0 

6  points,  4  coplanar  (*) 

17 

0 

2 

line  and  4  coplanar  points 

15 

2 

2 

line  and  5  points,  4  coplanar 

18 

0? 

3? 

2  lines  and  4  coplanar  points 

19 

0? 

4? 

Table  1:  The  number  of  functionally  independent  scalar  invariants  for  3D  configurations  under  the 
action  of  the  projective  group.  In  all  cases  general  position  is  assumed  e.g.  the  line  is  not  coplanar 
with  any  two  points.  (*)  indicates  that  the  epipolar  geometry  can  be  determined  from  two  views  of 
the  structure  (though  not  uniquely  in  the  case  of  7  points). 

planar  configuration  which  uniquely  defines  a  projective  basis  for  the  plane  could  equally  well  be  used. 
For  example,  four  coplanar  lines. 

That  the  epipolar  geometry  can  be  recovered  for  the  six  point  structure  has  been  established 
by  [2, 11].  The  derivation  is  repeated  here,  see  figure  1.  We  extend  this  analysis  to  the  determination 
of  the  essential  matrix^  Q.  It  is  shown  that  Q  is  uniquely  determined  by  the  set  of  six  point  matches. 
Further,  a  method  will  be  given  for  computing  Q.  The  method  is  linear  and  non-iterative.  This  result 
is  remarkable,  since  previously  known  methods  have  required  8  points  for  a  linear  solution  [9]  or  7 
points  for  a  solution  involving  finding  the  roots  of  a  cubic  equation  [6].  In  addition,  the  solution  using 
7  points  leads  to  three  possible  solutions,  corresponding  to  the  three  roots  of  the  cubic.  Since  Q  has 
7  degrees  of  freedom  [6]  it  is  not  possible  to  compute  Q  from  less  than  7  arbitrary  points.  Therefore 
it  is  somewhat  surprising  that  the  condition  that  four  of  the  points  are  co-planar  should  mean  that  a 
solution  from  six  points  is  possible  and  unique. 

The  second  structure  (four  coplanar  points  and  aline)  is  interesting  because  the  simple  replacement 
of  two  points  by  a  line  generates  two  significant  changes:  First,  it  is  not  possible  to  recover  the  epipolar 
geometry  from  this  alone;  second,  the  structure  has  an  isotropy  under  the  projective  group.  However, 
both  structures  have  two  projective  invariants  which  can  be  recovered  from  two  views. 

1.1  Number  of  Invariants  and  Isotropies 

As  described  in  [13]  the  number  of  (functionally  independent  scalar)  invariants  to  the  action  of  a  group 
G  is  given  by: 

^invar  =  dim  5  —  dim  G  -f-  dim  Gx 

where  dim  5  is  the  “dimension”  of  the  structure,  dimG  the  dimension  of  G,  in  this  case  15,  and 
dimGx  the  dimension  of  the  isotropy  sub-group  (if  any)  which  leaves  the  structure  unaffected  under 
the  action  of  G.  Examples  are  given  in  table  1. 

The  key  point  about  an  isotropy  is  that  a  structure  with  fewer  degrees  of  freedom  than  the  group 
dimension  can  still  have  invariants.  In  section  3.3  we  discuss  the  isotropy  of  the  line  and  four  coplanar 

‘This  matrix  was  introduced  by  Longuet-Higgiiu  [9]  assuming  the  two  cameras  were  calibrated,  and  has  since  been 
extensively  investigated  e.g.  [10].  Most  of  the  results  also  apply  to  uncalibrated  cameras  of  the  type  considered  m  this 
paper  [6]. 
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point  configuration. 

2  Six  points,  four  coplanar 

Consider  a  set  of  matched  points  x(-  4-^  x,'  for  t  =  1, . .  .,6  and  suppose  that  the  points  Xi, . .  .,X4^ 
corresponding  to  the  first  four  matched  points  lie  in  a  plzme  in  space.  Let  this  plane  be  denoted  by 
n.  Suppose  also  that  no  three  of  the  points  Xi, . .  .,X4  are  collinear.  Suppose  further  that  the  points 
X5  and  Xe  do  not  lie  in  that  plane.  Various  other  assumptions  wiU  be  necessary  in  order  to  rule  out 
degenerate  cases.  These  will  be  noted  as  they  occur. 

In  the  following  sections  we  first  explain  how  the  epipolar  geometry  is  determined.  We  then  prove 
that  this  configuration  is  sufficient  to  uniquely  define  the  Q  matrix  and  hence  five  of  the  points  may 
be  used  as  a  projective  basis  for  V^. 

2.1  Epipolar  Geometry 

First  it  will  be  shown  that  the  problem  may  be  reduced  to  the  case  in  which  x(-  =  x,-  for  t  =  1, . .  .,4. 
From  the  assumption  that  points  Xi, . .  .,X4  lie  in  a  plane  and  that  no  three  of  them  are  collinear, 
it  may  be  deduced  that  no  three  of  the  points  xi, . . .  ,X4  are  collinear  in  the  first  image  and  that  no 
three  of  x^, . .  .,x^  are  collinear  in  the  second  image.  Given  this,  it  is  possible  in  a  straight-forward 
manner  to  find  a  3  x  3  projective  transformation  matrix  T,  such  that  x^  =  Tx,-,  t  €  {1, ..,  4}.  Denoting 
Tx,-  by  the  new  symbol  x",  we  see  that  xj  =  xj'  for  t  =  1, . .  .,4. 

Therefore,  we  wiU  assume  for  now  that  xj  =  Xj  for  »  =  1,. .  .,4.  This  being  so,  it  is  possible  to 
characterize  the  points  that  lie  in  the  plane  11  defined  by  Xi, . .  .X4.  A  point  Y  Ues  in  the  plane  11  if 
and  only  if  it  is  mapped  to  the  same  point  in  both  images. 

Now  consider  any  point  Y  in  space,  not  on  11,  and  consider  the  epipolar  plane  defined  by  Y  and 
the  two  camera  centres  (see  figure  1).  This  plane  wiU  meet  the  plane  11  in  a  straight  Une  L(Y)  C  11. 
The  line  X(Y)  must  pass  through  the  point  P  in  which  the  Une  of  the  camera  centres  meets  the  plane 
n.  This  means  that  for  aU  points  Y  the  Unes  L(Y)  are  concurrent,  and  meet  at  the  point  P.  Now  we 
consider  the  images  of  the  Une  L(Y)  and  the  point  P  as  seen  from  the  two  cameras.  Since  the  Une 
L{Y)  Ues  in  the  plane  n  it  must  be  the  same  as  seen  from  both  the  cameras.  Let  the  image  of  L(Y) 
as  seen  in  either  image  be  ^(Y).  If  y  and  y*  are  the  image  points  at  which  Y  is  seen  from  the  two 
cameras,  then  both  points  y  and  y*  must  Ue  on  the  Une  ((Y).  Since  the  point  P  Ues  in  the  plane  H,  it 
must  map  to  the  same  point  in  both  images,  so  p  =  p'  and  this  point  Ues  on  the  Une  f(Y).  Therefore, 
y,  y'  and  p  are  coUinear.  The  point  p  can  be  identified  as  the  epipole  in  the  first  image,  since  points 
p  and  the  two  camera  centres  are  coUlnear.  Similarly,  p'  is  the  epipole  in  the  second  image.  Thus  one 
point  not  on  11  is  sufficient  to  determine  a  Une  in  each  image  on  which  the  epipole  must  Ue^. 

This  discussion  may  now  be  appUed  to  the  points  X5  and  X^.  Since  X5  and  Xe  do  not  Ue  in  the 
plane  11  it  foUows  that  X5  X5  and  Xg  /  xg.  Then  the  point  p  may  easily  be  found  as  the  point 
of  intersection  of  the  Unes  <  X5,X5  >  and  <  x^,X6  >.  As  an  aside,  the  point  of  intersection  of  the 
Unes  <  X5,X6  >  and  <  X5,X6  >  is  of  interest  as  being  the  image  of  the  point  where  the  Une  through 
<  X5,  Xe  >  meets  the  plane  11,  see  section  2.3. 

^We  adopt  the  notation  that  corresponding  points  in  the  world  and  image  are  distinguished  by  large  and  small  letters. 
Vectors  are  written  in  bold  font,  e.g.  x  and  X.  Homogeneous  representations  are  used  e.g.  Xi  =  (Xi,  Yi,  Zi,  1)^  x  and 
x'  are  corresponding  image  points  in  two  views. 

^Another  way  to  sec  this  is  that  Y  and  Yi  (a  virtual  point),  sec  figure  1,  are  collinear  in  the  first  image.  This  is  the 
condition  for  motion  parallax.  As  described  in  [8],  their  positions  in  the  second  image  (y^  and  Ty)  are  coincident  with 
the  focus  of  expansion  (the  epipole).  We  are  grateful  to  Andrew  Blake  for  this  observation. 
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Figure  1;  Epipolar  geometry.  The  points  Xi, . . . , X4  are  coplanar,  with  images  x,  and  xj  in  the  first 
and  second  images  respectively.  The  epipolar  plane  defined  by  the  point  Y  and  optical  centers  O  and 
O'  intersects  the  plane  11  in  the  line  L(Y)  =<  Yi,  Y2  >,  where  Yi  and  Y2  are  the  intersections  of  11 
with  the  lines  <  Y,  O  >  and  <  Y,  O'  >  respectively. 

The  epipolar  line  may  be  constructed  in  the  second  image  as  follows:  Determine  the  plane  projective 
transformation  such  that  xj  =  Tx,*  €  4}.  Use  this  transformation  to  transfer  the  point  y  to 

=  Ty.  This  determines  two  points  in  the  second  image,  y'  and  Ty,  which  are  projections  of  points 
(Y  and  Yi)  on  the  line  <  O,  Y  >.  This  defines  the  epipolar  line  of  Y  in  the  second  image.  A  second 
point,  not  on  11,  will  define  its  corresponding  epipolar  lines.  The  epipole  lies  on  both  lines,  so  is 
determined  by  their  intersection.  A  similar  construction  gives  epipolar  lines  and  hence  the  epipole  in 
the  first  image. 


DRAFT 


5 


The  previous  discussion  indicates  how  the  epipole  may  be  found.  This  construction  will  succeed 
unless  the  two  lines  <  X5,  X5  >  and  <  Xg,  Xe  >  are  the  same.  The  two  lines  will  be  distinct  unless  the 
two  points  X5  and  Xe  lie  in  a  common  plane  with  the  two  camera  centres. 

To  summarise: 


1.  Calculate  the  plane  projective  transformation  matrix  T,  such  that  xj  =  Tx,-,i  G  {1,  ..,4}. 

2.  Determine  the  epipole,  p',  in  the  second  image  as  the  intersection  of  the  lines  <  Txs,  X5  >  and 
<  Tx6,x4  >.  Note,  these  lines  are  given  by  Txj  x  x<,  i  €  {5,6}  [16].  Similarly,  the  epipole  in 
the  first  image  is  the  intersection  of  the  lines  T“^xJ  x  x,',  i  6  {5,6}. 

3.  The  epipolar  line  in  the  second  image  corresponding  to  a  point  x  in  the  first  is  given  by  Tx  x  p'. 

2.2  Computation  of  Essential  Matrix 
The  essential  matrix,  Q,  satisfies  the  condition 

=  0  (1) 

for  all  i.  As  in  the  previous  section  the  problem  of  determining  the  matrix  Q  is  reduced  to  the  case  in 
which  x[  =  X,-  for  i  =  1, . . . ,  4.  If  x['  =  TxJ-  for  t  =  1, . . . ,  4,  then 

0  =  xj'Tgxi  =  xi'^gi-^x;' .  (2) 

So,  denoting  Qi  =  QT~^,  the  task  now  becomes  that  of  determining  Qi  such  that 

x[''’gixf  =  0  (3) 

for  all  i.  In  addition,  x(  =  x('  for  t  =  1, . .  .,4.  Once  Qi  has  been  determined,  the  original  matrix  Q 
may  be  retrieved  using  the  relationship 

Q  =  QiT  ■  (4) 

Now,  if  Q  is  the  essential  matrix  corresponding  to  the  set  of  matched  points,  then  since  p  is  the 
epipole  in  the  first  image,  we  have  an  equation 


gp  =  0 


and  since  p'  =  p  is  the  epipole  in  the  second  image,  it  follows  also  that 

p^Q  =  0 

Furthermore,  for  t  =  1, . . . ,  4,  we  have  x,-  =  xj,  and  so,  Xj^Qxi  =  0.  For  *  =  5, 6,  we  have  x[  =  Xi+Ojp. 
Therefore,  0  =  x[‘''gx<  =  (x,-  +  ajp)''’gxi  =  x<''’gx,.  So  for  all  i  =  1, . . . ,  6, 

xjgx.  =  0  . 

This  should  give  more  than  enough  equations  in  general  to  solve  for  Q,  however,  the  existence  and 
uniqueness  of  the  solution  need  to  be  proven 


DRAFT 


6 


Now,  a  new  piece  of  notation  will  be  introduced.  For  any  vector  t  =  we  define  a 

skew-symmetric  matrix,  S(t)  according  to 

/  0  —tf  ty  \ 

5(t)  =  t,  0  -t,  .  (5) 

\  ®  / 

Any  3x3  skew-symmetric  matrix  can  be  represented  in  this  way  for  some  vector  t.  Matrix  5(t)  is  a 
singular  matrix  of  rank  2,  unless  t  =  0.  Furthermore,  the  null-space  of  5(t)  is  generated  by  the  vector 
t.  This  means  that  5(t)  =  5(t)t  =  0  and  that  any  other  vector  annihilated  by  5(t)  is  a  scalar 
multiple  of  t. 

We  now  prove  the  existence  and  uniqueness  of  the  solution  for  the  essential  matrix. 

Lemma  2.1  Let  p  be  a  point  in  projective  2-$pace  and  let  {x,  }  6e  a  further  set  of  points.  If  there  ore 
at  least  three  distinct  lines  among  the  lines  <  p,x,  >  then  there  exists  a  unique  matrix  Q  such  that 

P^Q  =  Qp  =  0 


and  for  all  i 


xjQx,  =  0 


Furthermore,  Q  is  skew-symmetric,  and  hence  Q  «  S(p). 

Proof  :  Let  us  assume  without  loss  of  generality  that  the  lines  <  p,  x,-  >  for  i  =  1, . . . ,  3  are  distinct. 
Let  T2  be  a  non-singular  matrix  such  that 


T2P  =  (0,0,l)'r 
T2Xi  =  (l,0,0)'r 
T2X2  =  (0,1,0)T 


Suppose  that  T2X3  =  {r,s,t)'^.  Since  the  lines  <  p,x,-  >  are  distinct,  so  must  be  the  lines 
<  T2P,  T2Xt  >•  From  this  it  follows  that  both  r  and  s  are  non-zero,  for  otherwise,  the  line  <  T2P,  T2X3  > 
must  be  the  same  as  <  T2P,T2X,-  >  for  *  =  1  or  2.  Now,  define  the  matrix  Q2  =  T2^QT2.  Then 

T2'^Q2(0,  0, 1)T  =  T2'^g2T2P  =  gp  =  0 


and  so 

g2(o,o,ir  =  o 

(6) 

Similarly, 

(0,0, 1)^2  =  0 

(7) 

Next, 

(i,o,o)g2(i,o,o)''’  =  xi‘^T2'^g2T2Xi  =  xi’^gxi  =  o 

(8) 

and  similarly. 

(o,i,o)g2(o,i,of  =  o  . 

(9) 

and 

{r,s,t)Q2ir,s,ty  =  0  . 

(10) 

Now,  writing 

II 

0* 
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equation  (6)  implies  c  =  f  =  j  =  0.  Equation  (7)  implies  g  =  h  =  j  =  0.  Equation  (8)  implies  a  =  0 
and  equation  (9)  implies  e  =  0.  Finally,  equation  (10)  implies  rs{b  +  d)  =  0  and  since  rs  ^  0  this 
yields  b  +  d  =  0.  So, 

(0  6  0  \ 

-6  0  0 
0  0  0  / 

which  is  skew-symmetric.  Therefore,  Q  =  TJ ^  is  also  skew-symmetric. 

The  first  part  of  the  lemma  has  been  proven.  Now,  since  Q  is  skew-symmetric  and  Qp  =  0,  it 
follows  that  Q  =  5(p),  as  required.  This  shows  uniqueness  of  the  essential  matrix  Q.  To  show  the 
existence  of  a  matrix  Q  satisfying  all  the  conditions  of  the  lemma,  it  is  sufficient  to  observe  that  a 
skew-symmetric  matrix  Q  has  the  property  that  x,^Qx,-  =  0  for  any  vector  x,.  □ 

This  lemma  allows  us  to  give  an  explicit  form  for  the  matrix  Q  expressed  in  terms  of  the  original 
matched  points. 

Theorem  2.2  Let  {x(}  <-*•  {x,}  be  a  set  of  6  image  correspondences  derived  from  6  points  X,-  in  space, 
and  suppose  it  is  known  that  the  points  Xi, . .  .,X4  lie  in  a  plane.  Let  T  be  a  3  x  3  matrix  such  that 
xj-  =  Tx,-  for  i  =  1, . .  .,4.  Suppose  that  the  lines  <  Xs,Txs  >  and  <  Xe,Tx6  >  are  distinct  and  let  p  6e 
their  intersection.  Suppose  further  that  among  the  lines  <  x{,  p  >  there  are  at  least  three  distinct  lines. 
Then  there  exists  a  uriique  essential  matrix  Q  satisfying  the  point  correspondences  and  the  condition 
of  coplanarity  of  the  points  Xi, . . .,  X4  and  Q  is  given  by  the  formula 

Q  =  s{p)t 

The  conditions  under  which  a  unique  solution  exists  may  be  expressed  in  geometrical  terms. 
Namely  : 

1.  Points  Xi, . .  .,X4  lie  in  a  plame  11,  but  no  three  of  them  are  collinear. 

2.  Points  X5  and  Xe  do  not  lie  in  the  plane  11,  and  do  not  lie  in  a  common  plane  passing  through 
the  two  camera  centres. 

3.  The  points  Xi , . . . ,  Xe  do  not  all  lie  in  two  planes  passing  through  the  camera  centres. 

Under  the  above  conditions,  the  essential  matrix  Q  is  determined  uniquely  by  the  set  of  image 
correspondences.  Note  that  according  to  [4,  7],  this  in  turn  determines  the  locations  of  the  points 
themselves  and  the  cameras  up  to  a  projective  transformation  of  3-space. 

2.3  Projective  invariants 

The  meaming  of  the  3D  projective  invariants  can  most  readily  be  appreciated  from  figure  2.  The 
line.  A,  formed  from  the  two  non-coplanar  points  intersects  the  plane  11  in  a  unique  point  X/.  This 
construction  is  unaffected  by  projective  transformations  of  V^.  There  are  then  5  coplanar  points  and 
consequently  two  plane  projective  invariants  -  which  are  also  invariants  of  the  3D  transformation. 

As  described  in  section  2.1,  the  image  of  X/  can  be  determined  from  two  views  (see  [15]  for  an 
alternative  derivation).  We  then  have  the  image  of  five  coplanar  points,  for  which  plane  projective 
invariants  may  be  calculated.  These  invariants  have  the  same  value  calculated  on  11  or  from  any 
projection  of  11.  This  construction  does  not  require  epipolar  calibration  to  be  known. 


To  summarise: 
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Figure  2:  The  projective  invariant  of  6  points,  4  coplanar  (points  1-4),  can  be  computed  by  intersecting 
the  line,  A,  through  the  non-planar  points  (5  and  6)  with  the  common  plane.  There  are  then  5  coplanar 
points,  for  which  two  invariants  to  the  plane  projective  group  can  be  calculated. 

1.  Determine  the  imaged  intersection  X/  of  the  plane  11  and  the  line  A  in  the  second  image  as  the 
intersection  of  the  lines  Txs  x  Txe  and  Xj  x  x^. 

2.  Calculate  the  two  plane  projective  invariants  of  five  points,  x(-,  t  €  {1, ..,  4},  and  Xj  =  X;.  These 
are  given  by 

J  _  |Tni25||mi34|  _  |mi24||m235| 

^  |mi24||mi35|  |ni234||”ll25| 

where  mjki  is  the  matrix  [x'  x^x{]  and  ||  is  a  determinant. 

2.3.1  Relation  between  2D  invariants  and  algebraic  invariants  of  3D  points 

Six  3D  points  in  in  general  position  have  three  projective  invariants.  The  coplanarity  reduces 
by  one  the  number  of  invariants  (one  of  the  invariants  will  be  zero).  We  may  arbitrarily  choose 
coordinates  for  5  of  the  points  of  the  six  point  configuration  (any  other  coordinates  of  the  five  points 
can  be  transformed  to  these  by  a  collineation  of 

Xx  =  (1,0,0,0)T 

X2  =  (0,l,0,0)‘r 

X3  =  (0,0,1,0)T 

Xs  =  (0,0,0, 1)T 
Xe  =  (l,!,!,!)"^ 


The  fourth  coplanar  point  then  has  coordinates: 

X4  =  (a,y3,7,0)'^ 

The  coordinates  of  this  point  give  rise  to  the  two  independent  projective  invariants  of  the  six  points: 

II  =  a/7  II  =  /?/7  (12) 
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For  the  five  point  planar  invariants  we  use  the  coordinates  of  the  first  four  points  restricted  to  the 
plane  (the  subordinate  geometry): 

Xi  =  (1,0, 0)T 
X2  =  (0,1, 0)'^ 

X3  =  (0,0, 1)T 

X4  = 

The  intersection  of  the  line  <  X5,  Xe  >  with  11  is  given  by 

X/  =  (I,!,!)"" 

Then  from  (11)  the  five  point  planar  invariants  are: 

Ii  =  (3h  =  (13) 

i.e.  simply  functions  of  the  two  3D  invariants  in  (12)  above  as  expected. 

3  A  line  and  four  coplanar  points 

Here  the  two  non-coplanar  points  of  the  previous  section  are  “replaced”  by  a  line,  A.  As  with  the 
previous  configuration  this  structure  has  two  projective  invariants  (which  are  determined  from  the 
two  five  point  invariants  of  the  four  coplanar  points  and  the  intersection  of  the  line  with  the  plane 
n)  which  can  be  determined  from  two  views.  However,  it  is  no  longer  possible  to  recover  the  epipolar 
geometry,  it  is  not  even  possible  to  restrict  the  epipole  to  a  line  in  each  image. 

3.1  Constraints  on  epipolar  geometry 

Surprisingly  the  image  of  A  in  each  view  adds  no  constraints  at  all  towards  solving  for  the  epipolar 
geometry  or  essential  matrix.  To  see  this  geometrically  consider  the  back  projection  of  a  point  imaged 
in  two  views.  Each  point  back  projects  to  a  ray.  In  general  two  lines  are  skew  in  'P^,  so  the  condition 
that  they  intersect  (since  they  arise  from  a  common  point)  constrains  the  imaging  geometry.  In 
contrast  the  back  projection  of  a  line  is  a  plane,  and  in  general  two  planes  intersect  in  a  line  in  V^. 
Consequently,  no  constraint  is  given.  Adding  a  third  view  does  constrain  the  geometry  since  three 
planes  intersect  in  a  point  in  general,  not  a  line,  in  V^. 

3.2  Projective  invariants 

As  in  the  six  point  case  the  invariant  under  projective  transformations  of  can  be  obtained  from  the 
five  point  planar  invariants  of  the  four  coplanar  points  and  the  intersection  of  A  with  H.  Again  this 
can  be  calculated  from  two  views,  where  here  the  plane  projective  transformation  is  used  to  transfer 
a  line,  the  image  of  A.  This  construction  does  not  require  epipolar  calibration  to  be  known. 

As  described  in  table  1,  the  line  and  four  coplanar  point  configuration  has  only  15  degrees  of 
freedom.  That  two  invariants  can  be  constructed  indicates  the  presence  of  an  isotropy  sub-group.  The 
action  of  this  group  is  described  below. 


To  summarise: 

Given  a  set  of  matched  points  xj  *-*  x,  for  i  =  1, . .  .,4  which  are  the  images  of  coplanar  points, 
together  with  the  images  1  and  1'  of  a  line  in  not  on  H^ 


A  does  lie  on  II,  the  transferred  line  will  be  coincident  with  1*,  i.e.  I'  =  T“^l. 
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1.  Determine  the  imaged  intersection  X;  of  the  plane  11  and  the  line  A  in  the  second  image  as  the 
intersection  of  the  lines  1'  and  T“^l.  This  is  given  by  X/  =  1'  X  T“^l  [16]. 

2.  Calculate  the  two  plane  projective  invariants  of  five  points,  x[,  i  €  {1,  ..,4},  and  Xj  =  X;  using 
equation  (11). 

3.3  Existence  of  an  Isotropy 

As  explained  in  section  1.1  in  order  for  the  line  and  four  coplanar  points  to  have  two  invariants 
under  collineation  of  P^there  must  by  an  isotropy  sup-group  acting.  In  this  section  we  ^ve  a  simple 
derivation  of  this  sub-group  which  leaves  the  structure  unchanged  under  the  action  of  the  projective 
group,  and  determine  its  action  on 

The  construction  of  the  sub-group  is  in  two  stages: 

1.  Construct  the  sub-group  for  which  11  is  a  plane  of  fixed  points.  This  is  necessary  since  four  points 
remain  fixed  under  the  action  of  the  sub-group,  and  consequently  every  point  on  the  plane  is 
unchanged  (as  four  points  define  a  basis  for  the  plane). 

2.  Construct  the  sub-group  of  (1)  for  which  the  line  A  is  a  fixed  line.  Note  this  does  not  have 
to  be  a  line  of  fixed  points  since  only  one  point  on  the  line  (its  intersection  with  11  must  be 
unchanged). 

We  adopt  the  notation  of  section  2.3.1  for  the  six  points.  The  line  A  is  given  in  its  homogeneous 
parametric  representation  by 

A  =  C(1,1,1,1)‘^  +  »?(0,0,0,1)'^  (14) 

First,  in  order  for  11  to  be  a  plane  of  fixed  points  it  is  necessary  and  sufficient  that  the  4  x  4 
transformation  matrix  T  satisfies 


X<  =  TX<,  »e  4} 


It  is  a  simple  matter  to  show  that  T  must  have  the  form 

^  Hi  0  0  /i2  ^ 

^  _  0  0  M3 

0  0  Ml 

^  0  0  0  Ms  / 


(15) 


where  MtS  i  C  {1,  ••»5}  parameterise  the  sub-group  which  has  four  dof  (only  their  ratio  is  significant). 

Second,  we  determine  the  sub-group  which  leaves  A  fixed.  This  can  be  carried  out  using  Pluckerian 
line  coordinates  [16],  but  here  we  use  the  parametric  representation  (14)  above.  Under  the  action  of 
the  isotropy  group  the  points  on  the  line  need  not  be  fixed,  but  the  transformed  points  must  still  lie 
on  A.  The  transformation  of  two  points  is  sufficient  to  determine  the  transformed  lines  (three  are 
required  to  determine  the  transformation  of  all  the  points  on  the  line).  By  inspection  TXs  and  TX5 
satisfy  (14)  iff  M2  =  M3  =  M4-  Hence  we  arrive  at 

^  Ml  0  0  M2  ^ 

^  _  0  Ml  0  M2 

0  0  Ml  /*2 

\  0  0  0  Ms  / 

which  is  a  two  dimensional  sub-group  of  the  collineations  of 


(16) 
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Images 

h 

h 

D,A 

0.440 

-0.968 

D,B 

0.378 

-1.117 

B,A 

0.371 

-1.170 

C,E 

0.370 

-1.150 

F,A 

0.333 

-1.314 

D,A,B 

0.372 

-1.151 

C,E,D 

0.369 

-1.148 

F,A,C 

0.370 

-1.196 

C,A,B,D,E 

0.375 

-1.140 

F,A,B,C,D,E 

0.369 

-1.170 

Table  2:  This  table  shows  the  line  and  four  coplanar  point  invariants  extracted  from  several  combina¬ 
tions  of  views  using  points  2,4,14,17  and  the  line  between  points  6  and  13. 

It  is  interesting  to  examine  the  transformation  of  Plunder  the  action  of  T.  The  clearest  way  to  see 
this  is  to  determine  the  eigen- vectors  of  T.  These  are  the  fixed  points  of  the  coUineation.  We  find 

El  =  (1,0,0,0)T 
Ej  =  (0,1,0,0)T 
Ea  =  (0,0, 1,0)7 
E4  =  -Ml)^ 

The  first  three  are  degenerate  with  eigen- value  /ii,  the  fourth  has  eigen-value  /is.  As  expected  any 
point  on  the  plane  X  =  t^iXi  -f  1/3X2  +  *<3X3  is  unchanged  by  T  (since  after  the  transformation  all  the 
basis  vectors  are  multiplied  by  /ii ).  The  fourth  eigen- vector  is  a  fixed  point  on  A.  To  see  the  effect 
of  the  isotropy  group  on  points  not  on  II,  consider  any  line  L  containing  E4.  This  will  intersect  II  at 
some  point,  Xn  say,  and  any  point  on  the  line  X  is  given  by  X  =  CE4  -f  i/Xn.  After  the  transformation 
the  point  is  TX  =  /i5CE4  -H  /ii7?Xn  which  stiU  lies  on  L  i.e.  any  line  through  E4  is  a  fixed  line  under 
the  isotropy.  Consequently,  since  every  point  in  lies  on  a  line  through  E4,  the  action  of  T  on  P^is 
to  move  points  towards  (or  away  from)  E4,  with  only  E4  and  points  on  II  remaining  unchanged. 

4  Experimental  Results 

The  images  used  for  acquisition  and  assessment  are  shown  in  figure  3. 

A  local  implementation  of  Canny’s  edge  detector  [3]  is  used  to  find  edges  to  sub-pixel  accuracy. 
These  edge  chains  are  linked,  extrapolating  over  any  small  gaps.  A  piecewise  linear  graph  is  obtained 
by  incremental  strmght  line  fitting.  Edgels  in  the  vicinity  of  tangent  discontinuities  (“corners”)  are 
excised  before  fitting  as  the  edge  operator  localisation  degrades  with  curvature.  Vertices  are  obtained 
by  extrapolating  and  intersecting  the  fitted  lines.  Figure  4  shows  a  typical  line  drawing. 

Although  invariants  obtained  from  two  views  are  fairly  stable,  improvements  in  stability  are 
achieved  by  augmenting  with  measurements  from  other  views.  At  present  this  is  carried  out  in  a 
primitive  fashion  by  determining  the  intersection  point  in  a  least  squares  manner.  See  table  2. 

5  Conclusions 

We  have  demonstrated  that  multiple  view  invariants  can  indeed  be  recovered  without  epipolar  cali¬ 
bration  being  necessary.  The  discussion  applies  as  well  to  analogues  of  this  configuration,  for  example: 
four  coplanar  lines  and  two  non-coplanar  points,  and  five  lines  (four  coplanar). 
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Figure  3:  Images  of  a  hole  punch  captured  with  different  lenses  and  viewpoints.  These  are  used  for 
structure  acquisition  and  transfer  evaluation. 
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Figure  4:  Line  drawing  of  the  hole  punch  extracted  from  image  A  in  figure  3.  Points  1  and  5  are 
occluded  in  this  view. 

We  have  also  shown  that  for  the  structure  with  an  isotropy  it  is  not  possible  to  determine  the 
epipolar  geometry.  We  conjecture  that  this  is  always  the  case  i.e.  if  the  3D  structure  has  an  isotropy 
under  the  projective  group  then  it  is  not  possible  to  determine  the  epipolar  geometry  (it  can  only  be 
constrained  up  to  a  family  of  solutions).  Of  course  the  converse  is  not  true  •  four  coplanar  points  and 
n  non-coplanar  lines  is  not  sufficient  to  determine  the  epipolar  geometry  for  any  n. 


Appendix:  Why  are  six  points  (four  coplanar)  sufficient  ? 

With  8  points  or  more  it  is  possible  to  solve  for  the  matrix  Q  by  solving  a  set  of  linear  equations.  If 
there  are  fewer  than  8  points,  the  set  of  linear  equations  will  be  under-determined,  and  hence  there 
will  be  a  family  of  solutions.  It  is  instructive  to  consider  how  the  extra  condition  that  four  of  the 
points  should  be  coplanar  cuts  this  family  down  to  a  single  solution.  Let  us  consider  a  particular 
example. 

Consider  a  set  of  6  matched  points  x|  ^  as  follows  : 


(1,0,0)T^(1,0,0)T 

(0,l,0)T^(0,l,0)'r 

(0,0,l)'r^(0,0,l)'r 

(1,1,1)'^^  (1,1,1)^ 

(1,0,0)T^(-1,1,1)T 

(0,1,0)T^(-1,1,1)T 


(17) 


Assume  that  the  first  4  points  lie  in  a  plane.  From  the  previous  discussion,  it  is  obvious  that  the 
epipole  is  the  point  (-1, 1, 1)^,  and  hence  that 


Q  =  5((-1,1,1)T  = 


0  -1  1  \ 

1  0  1 

-1  -10/ 


However,  we  will  compute  Q  directly.  Each  of  the  six  point  correspondences  gives  rise  to  an  equation 
x|Qx,'  =  0  which  is  linear  in  the  entries  of  Q.  Since  there  are  six  equations  in  nine  unknowns,  there 
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will  be  a  3-parameter  family  of  solutions, 
given  by 

Q  = 


It 

0 

B 

C 


is  easily  verified,  therefore,  that  the  general  solution  is 
A  -A\ 

0  .  (18) 
-C-2B  0  ) 


Now,  the  condition  det(<2)  =  0  yields  an  equation  2AB{C  +  B)  =  0,  and  hence,  either  C  =  -B  or 
A  =  0  or  B  =  0.  Thus,  Q  has  one  of  the  forms 


0 

A 

-A\ 

0 

0 

0  \ 

Q  = 

B 

0 

B 

or 

B 

0 

B 

or 

[-B 

-B 

0  ) 

-C-2B 

0  ) 

/  0  A  -A\ 
0  0  0 
\C  -C  0  ) 


(19) 


We  consider  the  first 
B  =  1,  and  so 


one  of  these  solutions  Since  Q  is  determined  only  up  to  scale,  we  may  choose 


(20) 


Next,  we  investigate  the  condition  that  the  first  four  matched  points  lie  in  a  plane.  To  do  this,  it 
is  necessary  to  find  a  pair  of  camera  matrices  that  realize  (see  citeHartley92)  the  matrix  Q.  It  does 
not  matter  which  realization  of  Q  is  picked,  since  any  other  choice  will  be  equivalent  to  a  projective 
transformation  of  object  space  (see  [7]),  which  will  take  planes  to  planes.  Accordingly,  since  Q  factors 
as  .... 


Q  = 


1 


1 


0  -1  1 

1  0  1 

-1  -1  0 


a  realization  of  Q  is  given  by  the  two  camera  matrices 


M  =  (/  I  0)  and 


1 

A 

A 


Then  it  is  easily  verified  that  the  points 


Xi  =  (l,0,0,0)T  ,  X2  =  (0, 1,0,0)^  ,  X3  =  (0,0,1,0)T 


X4  =  (1,1,1,*)'^ , 


where  k  is  defined  hy  1  +  k  =  -A  +  kA,  are  mapped  by  the  two  cameras  to  the  required  image  points 
as  specified  by  (17).  However,  the  requirement  that  these  four  points  lie  in  a  plane  means  that  k  =  0 
and  hence  that  A  =  —1.  Substituting  this  value  in  (20)  yields  the  expected  matrix  Q  =  5((-l,  1, 1)^). 
It  may  be  verified  that  the  two  other  choices  for  Q  given  in  (19)  do  not  lead  to  any  further  solution. 

The  role  of  the  coplanarity  condition  now  becomes  clear.  Without  this  condition,  there  are  a 
family  of  solutions  for  the  essential  matrix  Q.  Only  one  of  the  family  of  solutions  is  consistent  with 
the  condition  that  the  four  points  lie  in  a  plane. 
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